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1 An Introduction to GIS 


Introduction 


Geography has always been important. 
th if thei pora 
the location of their 
litos or died Wy heir newegg 


(Figure 1-1). Spatial information has a great 
lives by helping us produce 
we bum, the 
the diversions we 


information is so. 


important, we have developed tools called 


information systema (GIS) to 
spatial data, We will use 
GIS to refer to both singu- 
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lar, system, and plural systems. Some GIS 
components are purely technological: these 
include space-age data collectors, advanced 
communications networks, and sophisticated 
computing. Other GIS components are very 
simple, for example, a pencil to field-verify 
a paper map. 

Key to all definitions ofa GIS are 
"where" and “what.” GIS record the abso- 
lute and relative location of features, as well 
as the properties and attributes of those fea- 
tures, Mount Everest is in Asia, Timbuktu is 
in Mali, and the cruise ship Titanic is at the. 
bottom of the Atlantic Ocean. А GIS quanti- 
fies these locations by recording their coor- 
dinates, numbers that describe position on. 
Earth. The GIS may also note the height of 
Mount Everest, the population of Timbuktu, 
‘or the depth of the Titanic, and many other 
defining characteristics of spatial features 


What is a GIS? 


А GIS isa tool for making and using 
spatial information. Among the many defini- 
tions of GIS, we choose: 


A GIS is a computer-based system to aid 
in the collection, maintenance, storage, 
analysis, ouput, and distribution of spa- 
"ial information 
When used wisely, GIS can help us live 
healthier, wealthier, and safer lives. 

Each GIS user may decide what features 
and attributes are important, For example, 
forests are good for us. They may protect 
‘water supplies, yield wood, harbor wildlife, 
and provide space to recreate (Figure 1-2). 
‘We are concerned about the level of harvest, 
the adjacent land use, pollution from nearby 
industries, or where forests burn. Informed 
‘management requires knowledge of all these 
related factors and, perhaps above all, the 
spatial arrangement of these factors. Buffer. 
strips near rivers may protect water supplies, 
clearings may prevent the spread of fire, and 
polluters upwind may harm our forests. A 
GIS helps us analyze these spatial interac- 
tions, and is also particularly useful at dis- 
playing spatial data and analysis. A GIS is 
often the only way to solve spatially-related 
problems. 


Figure 1-2: GIS allow vs to analyze important 


‘forested area im western Oregon. with э ptchwor. 
(GIS шау sid ш earring ава recreation aber kar eto 


леу NASA. 


femurs. The satellite image at the center shows 
po ipae zones, and deserte A 
‘protection арй other benefits 


Why We Need GIS 


GIS use has become mandatory in many 
settings. GIS are used to fight crime, protect 
endangered species, reduce pollution. cope 
with natural disasters, treat epidemics, and 
improve public health: GIS are instrumental 
in addressing some of our most pressing 
societal problems. 


GIS tools in aggregate save billions of 
dollars annually in the delivery of goods and 
services. GIS regularly help in the day-1o- 
day management of many natural and man- 
made resources, including sewer, water, 
power, transportation networks, and package 
delivery. GIS аге at the heart of one of tbe 
most important processes in U.S. democ- 
racy, decadal re-drawing of U.S. congressio- 
nal districts, and hence the distribution of tax 
dollars and other government resources. 
GIS are needed in part because human 
consumption has reached levels such that 
many resources, including air and land, are 
placing substantial limits on human action. 
The first 100,000 years of human existence 
caused scant impacts on the world's 
resources, but in the past 300 years humans. 
have permanently altered most of the Earth s 
surface. The atmosphere and oceans exhibit 
a decreasing ability to benignly absorb car- 
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bon dioxide and nitrogen, two of humanity s 
primary waste products. Silt chokes many 
rivers. and there are abundant examples of 
smoke, ozone, or other noxious pollutants 
substantially harming public health. By the 
end of the 20 century. most lands south of 
the boreal region had been farmed, grazed, 
‘at, built over, drained, flooded, or other- 
wise altered by humans (Figure 1-3). 

GIS help us identify and address envi- 
ronmental problems by providing crucial 
information on where problems occur and 
who are affected by them. GIS help us iden- 
tify the source, location, and extent of 
adverse environmental impacts, and may 
help us devise practical plans for monitor- 
ing. managing, and mitigating environmen- 
tal damage. 

Human impacts on the environment 
have spurred a strong societal push for the 
adoption of GIS. Conflicts in resource use, 
concerns about pollution, and precautions to 
protect public health have led to legislative 
‘mandates that explicitly or implicitly require 
the consideration of geography. The US. 
Endangered Species Act (ESA) is a good 
‘example. The ESA requires adequate protec- 
tion of rare and threatened organisms. This 
entails mapping the available habitat and 


ма. 
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Figure 1-4: GIS may aid in disaster ausenumeut and reco 
mox of Paradise, California т November 2018. Emergency espouse nd longe 


teh consumed 


“This satelite image shows the Camp fire, 


ebd efforts may be unproved by wpatal data collection and analy (consten NASA), 


analyzing species range and migration pat- 
tems, relative to human land use. GIS use is 
mandated in other endeavors, including 
emergency services, flood protection, disas- 
ter assessment and management (Figure l- 
4), and infrastructure development, 


Public organizations have also adopted 
GIS because й aids in governmental func- 
tions. For example, emergency service vehi- 
cles ме regularly dispatched and routed. 
using GIS. E911 GIS matches the caller's 
address to the nearest emergency service sta- 
tion, a route is generated based on the street 
network and traffic, and emergency crews 
dispatched in a fraction of the pre-GIS times. 
Many businesses adopt GIS for 

increased efficiency in the delivery of goods 
and services, Retail businesses locate stores 
based on a number of spatially related fac- 
tors. Where are the potential customers? 
What is the spatial distribution of competing 
‘businesses? Where are potential new store. 
locations? What are traffic flows near cur- 
теш stores, and how easy is it to park near 


and access these stores? GIS are also used in 
hundreds of other business applications, to 
Toute vehicles, guide advertising, design 
buildings. plan construction, and sell real 
estate, 


The societal push to adopt GIS has been 
complemented by a technological pullin the 
development and application of GIS. Thou- 
sands of lives and untold wealth have been 
lost because ship captains could not answer 
the simple question, “Where am 1?” 
Remarkable positioning technologies, gener- 
ically known as Global Navigation Satellite 
Systems (GNSS), are now indispensable 
tools in commerce, planning. and safety. 
The technological pull has developed on 
several fronts. Spatial analysis in particular 
has been helped by faster computers with 
more storage, and by the increased intercon- 
nectedness via mobile networks. Most real- 
world spatial problems were beyond the 
scope of all but the largest organizations 
until the 1990s. GIS computing expenses are 
becoming an afterthought as costs decrease 


and performance increases at dizzying rates. 
Powerful feld computers are lighter, faster, 
more capable, and less expensive, so spatial 
data display and analysis capabilities may 
always be at hand (Figure 1-5). 


In addition to the computing improve- 
ments and the development of GNSS, cur- 
rent “cameras” deliver amazingly detailed 
aerial and satellite images. Initially. 
advances in image collection and interpreta- 
tion were spurred by World War II and then 
the Cold War because accurate maps were 
required, but unavailable. Turned toward 
peacetime endeavors, imaging technologies 
now help us map food and fodder, houses 
and highways, and most other natural and 
human-built objects. Images may be rapidi 
converted to accurate spatial information 
over broad areas (Figure 1-6). Many tech- 
niques have been developed for extracting 
information from image data, and also for 
ensuring this information faithfully rep- 
resents the location, shape, and characteris- 
tics of features on the ground. Visible light 
laser, thermal, and radar scanners are cur- 
rently being developed to further increase 
the speed and accuracy with which we map 
ош world. Thus, advances in these three key 
technologies — imaging, GNSS, and com- 
puting — have substantially aided the devel- 
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GIS in Action 
Spatial data are widely applied to 

improve life, Here we describe examples 

that demonstrate how GIS serves humanity. 


Marvin Matsumoto was saved with the 
help of GIS. The 60-year-old hiker became 
lost in Joshua Tree National Park. a 300.000- 
hectare desert landscape famous for its dis- 
tinct and rugged terrain, Between six and 
eight hikers become lost there in a typical 
year, sometimes fatally so. The U.S. 
National Park Service (NPS) organizes 
search and rescue operations due to the dan- 
ger of hypothermia, dehydration, and death 
(Figure 1-7). 

The search for Mr. Matsumoto was 
guided by GIS. Search teams electronically 
recorded and transmitted team location and 
progress, Position data were assimilated at 

field GIS center, which then updated maps. 
ofthe search area. On-site incident managers 


used these maps to task appropriate teams to 
'unvisited areas. Ground crews could be 
assigned to complex areas that had been 
searched by helicopters, but contained vege- 
tation oc terrain that limited visibility from 
above. Marvin was found on the fifth day, 
alive but dehydrated and with an injured 
skull and back from a fall. The search team 
жаз able to radio its precise location toa res- 
Cue helicopter. Another day in the field and 
‘Marvin likely would have died, а day saved 
by the effective use of GIS, 

GIS are also widely used in planning 
and environmental protection, Oneida 
County is located in northern Wisconsin, a 
forested area characterized by exceptional 
scenic beauty. The county is ina region with 
‘among the highest concentrations of fresh- 
water lakes in the world, a region that also 
supports a growing permanent and seasonal 
human populations. Retirees, urban exiles, 
and vacationers are increasingly drawn to 


the scenic and recreational amenities avail- 
able in Oneida County. The seasonal influx 
almost doubles the total county population. 
each summer. 


Population growth has caused a boom 
in construction and threatened the lakes that 
draw people to the county. There isa high 
number of nearshore houses, hotels, or busi- 
nesses, Seepage from septic systems, runoff 
from fertilized lawns, or erosion and sedi- 
ment from construction ай decrease lake 
water quality. Increases in lake nutrients or 
sediment leads to turbid waters, reducing tbe 
beauty and value ofthe lakes and nearby 
properties. 

In response to this problem. Oneida 
County implemented a Shoreland Manage- 
ment GIS Project. This project helps protect 
valuable nearshore and lake resources, and 
provides an example of how GIS tools are 
used for water resource management (Figure 
18). 

‘Oneida County has revised zoning and 
other ordinances to protect shoreline and 
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lake quality, and to ensure compliance with- 
‘out undue burden on landowners. The 
‘county uses GIS technology in the mainte- 
nance of property records. Property records 
include information on the owner, tax value, 
and any special zoning considerations. The 
‘county uses these digital records when creat- 
ing parcel maps; processing sale, subdivi- 
sion, or other parcel transactions: and 
integrating new data such as aerial or boat- 
based images to help detect property 
‘changes and zoning violations. 

015 may also be used to administer 
shoreline zoning ordinances, or to notify 
landowners of routine tasks, such as septic 
system maintenance. Northern lakes are par- 
ticularly susceptible to nutrient pollution 
from nearshore septic systems (Figure 1-9). 
Timely maintenance of each septic system 
must be verified. The GIS can automatically 
identify owners out of compliance and gen- 
erate an appropriate notification. 

GIS has helped the U.S. Fish and Wild- 
life Service manage the recovery of the Gray 
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Fl 
stie and LICOF) 


Wolf (Canis ups) in the lower 48 states of. 
the United States (Figure 1-10). Wolves 
‘were hunted to а remnant population in 
northern Minnesota. Given protection in 
1974, the population has rebounded to 
nearly 6.000 wolves that are spread across at 
least 11 states. GIS helped in many phases of 
the recovery, including identifying suitable 
‘habitat, monitoring pack location through 
time, mapping prey abundance and areas of 
high potential conflict with humans due to 
land use (e.g. ranching), assessing the 
impacts of range recovery on other resources. 
(deer and other game), and natural limits to 
range expansion 

Relatively new spatial data capture 
technologies are used to help in wolf recov- 
егу. Animals are tranquilize, fitted with sat- 
elite tracking collars, and released (Figure 
1-11). These collars may create an hourly 
record of wolf location. giving precise infor- 
‘mation on habitat occupancy, movement, 
hunting vs. resting time, denning sites, and 
dispersal. More data are provided in a few 
weeks by these satellite tracking collars than 
were possible with decades of collection 


дит 1 9 GIS may be used to тенин go стиги funetron. Here septic sera not omit 
i plan pae enmon onan ae etd ty ight cules (mte Wizani Sea Gat 


10. A gray wolf ome of a few sucess 
{all cerei чиште species, med wih 


using the older, radio-based technologies 
they replaced. 


Scientists at the Voyageurs Wolf Project 
have been tracking wolves to better under- 
stand their behavior (Figure 1-12). Part of 
wolf recovery and de-listing may include. 
hunting and trapping seasons in some areas. 
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Harvestisn't allowed in U.S. National Parks, 
but may be on adjacent lands, e.g., State and 
National Forests, Removing pack members 
may affect a pack s ability to group hunt, 
reproduce, or defend their territory. Wolves 
may respond to hunting pressure by moving 
further into parks, in tur displacing adjacent 
packs. Analysis of pack location and move- 
‘ments during trial hunting and trapping пау 
help guide а sustainable recovery. 

GIS are widely used to improve public 
health. Air pollution is a major cause of sick- 
ness and death, primarily from nitrogen and 
sulfur dioxides, carbon monoxide, ozone, 
‘and small particles from ой, gas, coal, and 
wood combustion. Primary sources are. 
power generation, factories, and anspor 
tion (Figure 1-13). Small particles lodge in 
the lungs, causing inflammation and reduc- 
ing lung function (Figure 1-14). Alveolar 

ges attempt to isolate this material. 
but air pollution levels commonly exceed the 
lung's capacity for self-cleaning. Damaging 
particle concentrations are typically higher 
in urban areas, or near traffic, power plants, 
and other pollution sources. GIS helps map 
concentrations, identify sources, and plan 


Legend 
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Improrameat. АБ pollution staves 10 yes 
off of the life span of about 200,000 people 
in the United States each year, and is respon- 
sible forthe death of 7 million people world- 
‘wide each year. It also causes increased 
sickness, hospitalization, and medical costs 
‘that annually reach into the billions of dol- 
lars, А reduction in air pollution has been 
shown to significantly reduce hospitaliza- 
tion, childhood asthma, and to increase life 
expectancy. 


Reducing sickness and death requires 
identifying areas of high exposure, particu- 
larly for vulnerable populations. Effective 
management requires an estimate of how 
much a decrease in pollution will increase 
health. Scientists have focused on these 
questions over the past decades, and can 
map exposures both over broader areas and 
at increasing level of spatial detail. 


Air pollution may be mapped from sat- 
dios due iot ede iege 
the optical properties of air (Figure 1-15, 
тор). A number of satellite instruments, cul- 
‘inating in the Ozone Mapping and Profile 
Suite (OMPS), have been launched over the 
past 30 years to record ай quality. Painstak- 
ing engineering, testing, and comparison to 
ground and airborne measurements have 
Verified instrument accuracies. This has led 
toa long-term record of pollutant concentra- 
tions, and improved understanding ofthe 
sources and dynamics of pollutants regions 
and the globe. These data allow measure- 
‘ment of peak and chronic exposure to pollut- 
ants for different populations. They show 
persistent areas of high exposure (Figure 1- 
15, top), some concentrated in cities, largely 
due to automobile traffic, and others over 
large areas, eg, the Midwest, due to large. 
coal-fired power plants and industrial 
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sources. Some areas are particularly prone to 
high concentrations due to surrounding high- 
lands, e.g., the Central Valley of California 
or Salt Lake City, Utah. 


‘Work by health scientists has identified 
the impacts of air pollution on target popula- 
tions, Increased rates of asthma, lung dam- 
age, and death observed in smaller studies ог 
individual cities can be expanded to broader 
areas through the combination of data in 
GIS, For example, combining health and 
population data with satelite exposure 
records has helped estimate the increase in 
life expectancy with a decrease in ай polli- 
tion, Legislation passed in the 1970s resulted 
in a measurable improvement in air quality 
across the United States, Progress has been 
variable across the country, with some popu- 
lations seeing larger reductions. Scientists 
‘measured the decrease in death rates in com- 
parable populations, and estimated an aver- 
age 2-year increase in life span foreach 
10 на m’ reduction in exposure (Figure 1- 
15, bottom) 


Mobile, on-road 
measurements. 


Additional work has focused on air pol- 
{ution at greater geographic detail, in рап to 
better quantify and manage individual expo- 
sure and risk Dr. Julian Marshall and collab- 
orators at the University of Minnesota have 
developed systems to sample pollutant con- 
centrations at very fine spatial intervals, 
towing an air sampling system behind a 
bicycle through a range of trafic densities, 
Toad types, and neighborhoods (Figure 1- 
16. 

Satellite positioning was synchronized. 
with video and air samples, and these com- 
bined with spatial data on road networks, 
population density, land use, and other fac- 
tors. Statistical models were then developed. 
These allow detailed estimates of pollutant 
concentrations, even down to the individual 
street (Figure 1-17). Such estimates may in 
тшт help reduce ай pollution, plan bicycle. 
or pedestrian corridors, separate the pollut 
ant loadings due to cars vs. trucks, buses or 
other large vehicles, and manage traffic or 
infrastructure to reduce human exposure, 
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GIS Components 


AGIS is composed of hardware. soft- 
ware, data, humans, and a set of organiza- 
tional protocols. The development and 
integration of these components is an itera- 
tive, ongoing process. The selection and pur- 
chase of hardware and software are often the 
easiest and quickest steps in creating a GIS. 
Data collection and. personnel 
development, and the establishment of pro- 
Tocols for GIS use are often more difficult 
and time-consuming endeavors. 


Hardware for GIS 


А fast computing. large data storage 
capacities, and a high-quality. large display 
form the hardware foundation of most GIS, 
Ithough the computing and storage are 
increasingly distributed across network and 
‘nto the data processing cloud (Figure 1-18). 
Fast computing is required because spatial 
analyses аге often applied over large areas 
at high spatial resolutions, Calculations are 
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often repeated over tens of millions of times, 
corresponding to each space we analyze in 
‘our area of interest. Even simple operations 
may take substantial time on general-pur- 
pose computers when applied to large areas, 
and complex operations can be unbearably 
long-running While advances in compute 
speeds during the past decades have substan- 
tially reduced the time required for most spa- 
tial analyses, computation times are still 
long for a few applications. 
Distributed computing is increas 

a fern Ra 
cellular, and wifi networks, Large data sets 
каў sow be nord rena and served over 
cellular or wifi networks, and computations 
may be migrated to local ог cloud-based 
servers (Figure 1-18). This network integra- 
tion often brings added complexity and new 
layers of software, and organizations must 
‘weight trade-offs between local and distrib- 
шей GIS systems. 
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While much hardware used in GIS is 
general-purpose and adaptable fora wide 
Tange of tasks, there are also specialized 
hardware components designed specifically 
for spatial data collection. GIS require coor- 
dinate data to define geographic features, 
such as roads, rivers, and parcels. GPS/ 
GNSS, field tablets, and other specialized 
equipment, described in Chapters 4 an 5, aid 

їп data entry. Recent increases in wireless 
‘communications substantially improve data 
input, allowing us to connect these mobile 
devices to existing GIS while in the field 
improving data entry via computers, tablets, 
phones, and GNSS/GPS units. 


GIS Software 


GIS software provides the tools to man- 
age, analyze, and effectively display and dis- 
seminate spatial information (Figure 1-19). 
GIS by necessity involves the collection and 
manipulation of coordinates. We also must 
collect qualitative or quantitative informa- 
tion on the nonspatial attributes of geo- 
graphic features. We need tools to view and 
edit these data, manipulate them to generate 


Data entry 
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and extract the information we require, and 
produce the materials to communicate ће 
{information we have developed. GIS soft- 
‘ware provides the specific tools for some ог 
all of these tasks. 

There are many public domain and com- 
mercially available GIS software packages, 
and many of these packages originated at 
academic or government-funded research 
laboratories. The Environmental Systems. 
Research Institute (ESRD) line of products, 
including ArcGIS, is a good example, Much 
ofthe foundation for early ESRI software 
was developed during the 1960s and 1970s 
at Harvard University in the Laboratory of 
‘Computer Graphics and Spatial Analysis, 
Alumni from Harvard included these in 
‘commercial products. and have developed 
additional methods and integrated new aca- 
demic research in the five decades since, 


Open Geospatial Consortium 

‘We will briefly cover the most common 
GIS software, but first wish to introduce the 
‘Open Geospatial Consortium (OGC). Their 
efforts have eased sharing across various 
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GIS softwares and computer operating sys- 
tems. Standards for data formats, documen- 
tation, program interactions, and 
Transmission have been developed and pub- 
lished (www.opengeospatial.org), and lists 
of standards-compliant software compiled. 
‘These common standards ease community 
adoption and reduce barriers to switching 
among softwares. Compliance with the stan- 
dards isa plus from a user's perspective. 


ArcGIS 


AreGIS is by some measures the most 
broadly deployed GIS, and provides a near- 
comprehensive set of geoprocessing func- 
tions, from data entry through analysis to 
‘most forms of data output. They provide 
desktop, enterprise, and cloud-bused ver- 
sions. ArcGIS supports multiple data types 
and structures, and literally thousands of 
possible operations that may be applied to 
spatial data. Substantial training is required 
to master the full capabilities of ArcGIS. 


ArcGIS provides wide flexibility in how 
we conceptualize and model geographic fea- 
tures, For example, elevation data may be 
stored in at least four major formats, each 
with attendant advantages and limitations. 
There is equal flexibility in the methods for 
spatial data processing, This broad array of 
choices, while responsible forthe large 
investment in time required for mastery of 
ARGIS, provides substantial analytical 
power. 


acis 


QGIS is an open-source software proj- 
ect, an initiative under the Open Source 
‘Geospatial Foundation. The software is a 
collaborative effort by a community of. 
developers and users. QGIS is fre, stable, 
changes smoothly through time, with the 
source code available so that it can be 
extended as needed for specific tasks. It pro- 
vides a graphical user interface, supports а 
wide variety of data types and formats, and 
runs on Unix, MacOSX, and Microsoft Win- 
dows operating systems. As with most open- 


source software, the original offering had 
limited capabilities. With an average of 
approximately two updates a year since. 
2002, QGIS provides a large number of 
basic GIS display and analysis functions. An 
interface has been developed with GRASS, 
another open-source GIS with complemen- 
tary analytical functions, but that lacks as 
straightforward a graphical user interface. 


GeoMedia 


 GeoMedia and related products are the 
popular GIS suite from Hexagon Geospatial, 
GeoMedia offers a complete set of data 
entry, analysis, and output tools. A compre- 
hensive set of editing tools may be pur- 
chased, including those for automated data 
entry and error detection, data development, 
data fusion, complex analyses, and sophisti- 
cated data display and map composition. 

 GeoMedia also provides a comprehen- 
sive set of tools for GIS analyses. Complex 
spatial analyses may be performed, includ- 
ing queries, for example, to find features in 
the database that match a set of conditions, 
and spatial analyses such as proximity ог 
overlap between features. World Wide Web 
and mobile phone applications are well sup- 
ported. 


Idrisi 


Versi is a GIS system developed by the 
Graduate School of Geography of Clark 
University, in Massachusetts. Idrisi differs 
from the previously discussed GIS software 
packages ш that it provides both image pro- 
cessing and GIS functions. Image data are 
useful as a source of information in GIS. 
There are many specialized software pack- 
ages designed specifically to focus on image 
data collection, manipulation, and output. 
Mdrisi offers much of this functionality while 
also providing a large suite of spatial data 
analysis and display functions. 

А suite of tools for Earth system model- 
ing has been developed on the Idrisi plat- 
form, and combined in the Terreset software 
system. Functions include land change mod- 


eling, habitat and biodiversity modeling, and 
climate change adaptation. 


AUTOCAD MAP 3D 


AUTOCAD is the world's largestsell- 
ing computer drafting and design package. 
Produced by Autodesk, Inc. of San Rafael, 
California, AUTOCAD began as an engi- 
neering drawing and printing tool. A broad 
range of engineering disciplines are sup- 
ported, including surveying and civil engi- 
neering, Surveyors have traditionally 
developed and maintained the coordinates 
for property boundaries, and these are 
among the most. and often-used 
spatial data. AUTOCAD MAP 3D adds sub- 
stantial analytical capability to the already 
complete set of data input, coordinate 
‘manipulation, and data output tools provided 
by AUTOCAD, 


GRASS 


GRASS, the Geographic Resource 
Analysis Support System. is а free, open- 
source GIS that runs on many platforms. The 
system was originally developed by the U.S. 
Amy Construction Engineering Research 
Laboratory, but was discontinued by the mil- 
itary. and taken up by an open-source project 
to maintain and enhance GRASS. The soft- 
ware provides a broad array of raster and 
vector operations, and is used in both 
research and applications worldwide. 
Detailed information and the downloadable 
software are available at 

hitp;/grass.osgeo org 


Bentley Map 


Bentley Systems has developed spatial 
analysis software for mobile device through 
enterprise levels, with a strong focus on flex- 
ible, integrated infrastructure design and 
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development. Bentley includes a general set 
of tools, including field data collection, pho- 
togrammetry, sophisticated map composi- 
tion, database management, analysis, and 
reporting. Bentley products are particularly 
focused on the built environment, including 
road, building. utility, and other large con- 
struction design, planning, and management 
Bentley also supports industry-specific 
tools, including mining and power genera- 
tion systems and networks. 


Spatial R, Python, and GDAL 


Generic programing, processing, and 
statistical analysis tools may be combined to 
provide most GIS functions, and include 
newer analytical methods not available in 
‘common commercial packages. R is an 
open-source software project with many spa- 
tial packages. These support а rich set of 
spatial operations, particularly for spatial 
estimation. Python is a general-purpose pro- 
graming language with several available 
spatial libraries. Notable among them are 
Shapely. Geopandas, and ру SAL, containing 
a large set of spatial functions. GDAL is a 
standard set of spatial input/output and data. 
processing functions, which may interface 
with both R and Python. Together, these 
tools support sophisticated GIS analysis. 


This review of spatial data software is 
incomplete. There are many other software 
tools available which provide unique, novel, 
‘or particularly clever combinations of geo- 
processing functions. Maptitude GIS, White- 
box GAT, Microlmages, Smallworld, 
Manifold GIS, ILWIS, Map Window, PCI, 
and gvSIG are just a few additional software 
packages with spatial data capabilities. In 
‘addition, there are thousands of add-ons, 
special-purpose tools, or specific modules 
that complement these products. 


18  GIS Fundamentals 


GIS in Organizations 


Although new users often focus oa GIS 
hardware and software components, we 
must recognize that GIS exist in ап institu- 
tional context, Effective use of GIS requires 
an organization to support various GIS activ- 
ities. Most GIS also require trained people to 
use them, and a set of protocols guiding bow 
the GIS will be used. The institutional con- 
text determines what spatial data are import- 
ant, how these data will be collected and 
used, and ensures that the results of GIS 
analyses are properly interpreted and 
applied. GIS share a common characteristic 
‘of many powerful technologies. If not prop- 
erly used. GIS may lead to a significant 
‘waste of resources, and may do more harm 
than good. The proper institutional resources 
are required for GIS to provide all йз poten- 
tial benefits. 


GIS are often employed as decision sup- 
port tools (Figure 1-20). Data are collected, 
entered, and organized into а spatial data- 
base. and analyses performed to help make 
specific decisions. The results of spatial 
analyses ina GIS often uncover the need for 
more data, and there are often several itera- 
tions through the collection, organization, 
analysis, output, and assessment steps before 
а final decision is reached. It is important to 
recognize the organizational stracture within 
which the GIS will operate, and how GIS 
will be integrated into the decision-making 
processes of the organization. 
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‘One first question is, “What problem(s) 
are we to solve with the GIS?” GIS add sig- 
nificant analytical power through the ability 
to measure distances and areas, identify 
vicinity, analyze networks, and through the 
overlay and combination of different infor- 
mation. Unfortunately spatial data develop- 
ment is often expensive, and effective GIS 
use requires specialized knowledge or train- 
ing, so there is often considerable expense in 
constructing and operating a GIS. Before 
spending this time and money, there must be 
a clear identification of the new questions 
that may be answered, or the process, prod- 
ча, or service that will be improved, made 
more efficient, or less expensive through the 
use of GIS. Once the ends are identified, an 
organization may determine the level of 
investment in GIS that is warranted. 


Summary 


GIS are computer-based systems that aid 
in the development and use of spatial data. 
‘There are many reasons we use GIS, but 
most are based on a societal push, our need 
to more effectively and efficiently use our 
resources, It also responds to a technological 
pull, our interest in applying new tools to 
previously insoluble problems. GIS as а 
technology is based on geographic informa- 
tion science, and is supported by the disci- 
plines of geography. surveying, engineering. 
space science, computer science, cartogra- 
phy, statistics, and a number of others. 


GIS are composed of both hardware and 
software components. Because of the large 


large storage capacities, fast computing 
speed, and ability to capture coordinates. 
Software for GIS are unique in their ability 
to manipulate coordinates and associated 
attribute data. А number of software tools 
and packages are available to help us 
develop GIS. 

While GIS are defined as tools for use 
with spatial data, we must stress the impor- 
tance ofthe institutional context in which 
GIS fit. Because GIS are most often used as 
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decision support tools, the effective use of 
GIS requires more than the purchase of hard- 
ware and software. Trained personnel and 
protocols for use are required if GIS are to 
be properly applied. GIS may then be incor- 
porated in the question—collect—analyze— 
decide loop when solving problems. 


The Structure of This Book 


‘This book is designed to serve a semes- 
ter-long. 15-week course in GIS at the uni- 
versity level. We seek to provide the relevant 
information to create а strong basic founda- 
tion on which to build an understanding of 
GIS. Because of the breadth and number of 
topics covered, students may be helped by 
knowledge of how this book is orpanized. 
Chapter 1 (this chapter) sets the stage, pro- 
viding some motivation and a background 
for GIS. Chapter 2 describes basic data rep- 
Tesentations И treats the main ways we use 
‘computers to represent perceptions of geog- 
raphy, common data structures, and how 
these structures are organized. Chapter 3 
provides a basic description of coordinates 
and coordinate systems, how coordinates are 
defined and measured on the surface of the 
Earth, and conventions for converting these 
measurements to coordinates we use ina 
95. 


Chapters 4 through 7 treat spatial data 
collection and entry. Data collection is often 
а substantial task and comprises one of the 
main activities of most GIS organizations. 
‘General data collection methods and equip- 
ment are described in Chapter 4. Chapter 5 
describes Global Navigation Satellite Sys- 
tems (GNSS), a common technology for 
‘coordinate data collection. Chapter 6 
describes aerial and space-based images as a 
source of spatial data. Most historical and 
‘contemporary maps depend їп some way on 
image data, and this chapter provides a back- 
ground on how these data are collected and 
used to create spatial data. Chapter 7 pro- 
‘vides a brief description of common digital 
data sources available in the United States, 
their formats, and uses. 
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Chapters 8 through 13 treat the analysis 
of spatial data. Chapter 8 focuses on attri- 
bute data, attribute tables, database design. 
and analyses using attribute data. Attributes 
эге half our spatial data, and a clear under- 
standing of how we structure and use them is 
key to effective spatial reasoning. Chapters 
9, 10, LI and 12 describe basic spatial anal- 
узез, including adjacency. inclusion, over- 
fay, and data combination for the main data 
models used in GIS. They also describe 
‘more complex spatio-temporal models. 
‘Chapter 13 describes various methods for 
spatial prediction and interpolation. We typi- 
cally find it impractical or inefficient to col- 
lect “wall-to-wall” spatial and attribute data, 
Spatial prediction allows us to extend our 
‘sampling and provide information for 
‘unsampled locations. Chapter 14 describes 


how we assess and document spatial data 
quality, while Chapter 15 provides some 
‘musings on current conditions and future 
trends. 


We give preference to the International 
System of Units (SI) throughout this book. 
The SI system is adopted by most of the 
‘world, and is used to specify distances and 
locations in the most common global coordi- 
nate systems and by most spatial data collec- 
tion devices. However, some English units 
are culturally embedded, for example, the 
survey foot, or 640 acres to a Public Land 
Survey Section, and so these are not con- 
verted. Because a large portion of the target 
audience for this book is in the United 
States, English units of measure often sup- 
plement 51 units. 
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‘Study Questions 


1.1 - Why are we more interested in spatial data today than 100 years ago? 


12 - You have probably collected, analyzed, or communicated spatial data in one 
Nay or another during the past month. Describe each of these steps for a specific 
application you have used or observed. 


1.3 - How are GIS hardware different from most other hardware? 


14 - Describe the ways in which GIS software are different from other computer 
software. 


1.5 - What are the limitations of using a GIS? Under what conditions might the tech- 
nology hinder problem solving. rather than help? 


16 - Are paper maps and paper data sheets a GIS? Why or why not? 
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2 Data Models 


Introduction 


Data in а GIS representa simplified view 
of йуп! anesthe ronds, пошу, 
accident locations, or other features we care 
about, Our data record spatial location and 
nonspatial properties. 

Each entity is represented by a spatial 
object in а GIS, defining an entity-object cor- 
Tespondence. Because every computer system 
has limits, we can’t save the exact, 
or all characteristics of features. As illustrated 
in Figure 2-1, we may store land cover as a 
set of polygons, with polygons of at least 30 
square meters. We may record data that 
define each land cover, vegetation type, own- 


ership, and land use, Smaller polygons and 
pe drei pd pistol 
not included in this representation, 

A data set's spatial detail and essential 
characteristics are subjectively chosen by the. 
data developer. The detail required by a sur- 
veyor will be different than that for a land 
planner. The essential characteristics of a for- 
est would be different in the eyes of a logger 
than those ofa hunter or hiker. No representa- 
tion is universally better than any other and 
the GIS developer seeks to define objects that 
support the intended data uses, at the desired 
level of detail and accuracy. 
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Figure 2-2: Levels of abstraction in the representation of spatial entities. The real world is represented in 
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A spatial dota model (Figure 2-2) may 
be defined as the objects in a spatial database 
plus the relationships among them. The term. 
“model” is fraught with ambiguity because it 
is used in many disciplines to describe many 
‘things. Here, а spatial data model provides a 
formal means of representing and manipulat- 
ing spatially referenced infomation. In Fig- 
‘ure 2-1, our data model consists of a зет. 
polygons recording the edges of distinct land 
uses, and a (not shown) set of variables asso- 
ciated with each polygon. The data model is 
the most recognizable level in our computer 
abstraction of the real world. Data structures 
(bow we organize the information in the 
computer) and binary machine code (how 
‘we record it) are successively less recogniz- 
able but more compter-compatible forms of 
"he spatial data (Figure 2-2) 
Most GIS store our data as a set of lay- 
ers (Figure 2-3). Each izes the 
Speisen кйш dors bed fcr 
graphic object, and are often referred to as 
Thematic layers. An example GIS database 
‘might include а soils data layer, a population. 
data layer, an elevation data layer, and a 
roads data layer. The roads layer contains 
only roads data, including the location and 
properties of roads in the analysis area. 
Information on soils, political boundaries, 
and elevation are contained in their respec- 


‘br analy obice 


tive data layers. Through analyses we may 
combine data to create а new data layer; 
example, we may identify areas that have 
high elevation and join this information with 
the soils data 


Coordinate Systems 

Coordinates are used to define the spa- 
ial location and extent of geographic objects 
(Figure 2-4). A coordinate most often con- 
sists of a pair or triplet of numbers that spec- 
ify location in relation to an origin. The 
coordinates quantify the distance from the 
origin when measured along standard direc- 
tions. Spatial data in a GIS most often use 
coordinate pairs. X and Y. in а Cartesian 
coordinate system, named after Rene Des- 
cartes, the system's originator. These pairs. 
define data on a planar, two-dimensional 
surface. The two-dimensional surface is usu- 
ally based upon standardized methods of 
парри the rome эбисе тоз Пи пир 
surface (discussed in Chapter 3). Typically 
attribute data complement the coordinate 
data for cartographic objects. These attribute 
data record the non-spatial components of an 
Objet such as a name, color, pl, or cash 

Keys, Labels, or other indexes are used 

зо that the coordinate and attribute data may 
be viewed, related, and manipulated 
together. 


Planar, two-dimensional (2-D) Cartesian 
coordinate systems define wo orthogonal 


2-D Cartesian Coordinate Systems 
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axes (right angle, or 90°), forming a plane 
Figure 2-5). We specify а y-axis, usually 
aligned at or close to a north-south direction, 
and an X-axis, usually aligned at or near an 
east-west direction. The Y-axis is often 
referred о as a northing axis and values 
increase upwards in a grid north direction. 
The X-axis is often refered to as an easting 
axis with values increasing to the right. 


shove), and 
et 
location on maps (right. above). 


шеси. 
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We must be careful when making mea- 
‘surements on our flat, 2-D surface. These 
‘measurements unavoidably distort relative 
locations, because the Earth's true surface 
approximates a sphere and hence is curved. 
We can keep errors small by limiting the 
area over which we use our flat map. As the 
mapped area gets larger, the error increases 
to amounts we usually can’t ignore. Specific 
conversions from 3-D to 2-D coordinate sys- 
tems and methods for managing distortion in 
flat map systems are discussed in Chapter 3. 


Coordinates on a Sphere 


When we map over larger areas or when 
we need the highest precision and accuracy, 
we often use a three-dimensional, spherical 
coordinate system. Hipparchus, a Greek 
mathematician of the 2nd century B.C., was 
among the first to specify locations on the 
Earth using angular measurements on a 
sphere. Our familiar Geographic Coordinate 
System uses two angles of rotation. the lon- 
gitude (A), and the latitude (¢), and а radius, 
R, to specify locations on Earth (Figure 2-6). 
The longitude measures east-west distances 
around the polar Earth axis, Zero is set for a 
line that passes through England, and the 

ance angle is positive eastward and nega- 
tive westward (Figure 2-6), Lines of equal 
longitude аге called meridians, and are ori- 
ented north-south. The zero longitude, also 
known as the Prime Meridian ot the Green- 
wich Meridian, was first specified through 
the Royal Greenwich Observatory because 
they had amassed the best early measure- 
ments, but the Prime Meridian has now 
shifted about 102 meters (335 feet) east of 
the Greenwich Observatory as measure- 
‘ments improved, tectonic plates shifted, and 
‘convention changed. 

A second angle of rotation, measured 
along north-south planes that intersect the. 
poles, is used to define a latitude. Latitudes 
are specified as zero at the Equator, the line 
encircling the Earth that is equidistant from. 
the North and South Poles. By convention, 
latitudes increase to maximum values of 90 
degrees in the north and south, or, if signed, 


from -90 at the South Pole to 90 at the North 
Pole. Lines of constant latitude are called 
parallels (Figure 2-6). 

Geographic coordinates do not form a 
Cartesian system because the meridians 
converge. A Cartesian system defines lines 
опа right-angle, planar grid. Geographic 
coordinates occur on a curved surface, and 
the longitudinal lines cross at the poles. 
This convergence means the distance 
spanned by a degree of longitude varies 
from approximately 111.3 kilometers at the 
Equator, to 0 kilometers at the poles. In 
contrast the ground distance for a degree 
of latitude varies only slightly, from 110.6. 
kilometers at the Equator to 111.7 kilome- 
ters at the poles. The slight difference with 
latitude is due to a non-spherical Earth, 
something we'll describe a bit Later. 

‘Convergence causes distortion because a 
degree of latitude spans a greater distance 
пем the poles than a degree of longitude, 
For example, “circles” with a fixed radius in 
‘geographic units, such as 3°, are not circles 
on the surface of the globe, with distortion. 

atthe poles (Figure 2-7, lefi). Th 
каакы жо ы 
face is “unrolled” and plotted on а flat map 
(Figure 2-7, right), but treating spherical 
Coordinates (latitudes/longitudes) as Carte- 
sian coordinates creates ап inherently di 
топей map. Note the distorted shape of 
Antarctica in Figure 2-7, right. 

Because the spherical system for geo- 
‘graphic coordinates is non-Cartesian, pla- 
nar formulas for area, distance, angles, and 
other geometric properties used in a Carte- 
sian coordinate system should not be used 
with geographic coordinates. Areas аге 
usually calculated after converting to a pro- 
jected system, described in chapter 3. 

‘There are two primary conventions 
used for specifying latitude and longitude 
(Figure 2-8) The first uses a leading letter, 
N.S, E, or W, to indicate direction, fol- 
lowed by a number to indicate location. 
Northem latitudes are preceded by an N 
and southem latitudes by an S, for exam- 
ple, N90°, 510°. Longitude values are pre- 
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ceded by an E or W, for example WILO. 
Longinides range from 0 to 180 degrees 
east or west. Note that the east and west 
longitudes meet at 180 degrees, so that 

E180° equals W180", 


Signed coordinates are the second com- 
‘mon way to specify latitude and longitude. 


EI 


3609 to circle the sphere 
60. or 60 minutes. for each degree 
60°, or 60 seconds. for each minute 


‘Norther latitudes are positive and southern 
latitudes are negative, and eastern longitudes 
positive and western longitudes negative. 
Latitudes vary from -90 degrees to 90 
degrees, and longitudes vary from -180 
degrees to 180 degrees. By this convention, 
the longitades "meet" at the maximum and 
minimum values, so -180° equals 180°. 


(OMS) notation: N43° 35° 20” for 43 
degrees, 35 minutes, and 20 seconds of Iati- 
tude. In OMS, each degree is made up of 60 
minutes of arc, and each minute is in tum 
divided into 60 seconds of arc (Figure 2-9), 
This yields 60 times 60, or 3600 seconds for 
each degree of latitude ог longitude, Spheri- 
cal coordinates may also be expressed аз 
decimal degrees (ОО). When using DD, the 
take the usual -180 to 180 (longi- 
) and -90 to 90 (latitude) ranges, but 
minutes and seconds are asa deci- 
mal portion of a degree (from 0 10 
099999... 
‘Conversion between OMS and DD is 
shown in Figure 2-10. 


E 


lnortn Pole 


Болт Pole 


Figure 29. There are 360 degrees ts complete crcl wi each дере competed of 60 inten and cach 


smite composed of 60 


рр from DMS 

DD = D + M/60 + 5/3600 

eg 

DMS = 32° 45 28" 

DD = 32 + 45/60. 28/3600 
= 32 + 075 + 00077778 
= 327577778 


DMS from DD 
D - integer part 
M» integer of decimal port x 60 
5. 2nd decimal x 60 
eg 
DD = 2493547 
0-24 
M » integer of first decimal x 60 
+093547 x 60. 
+ integer of 561282. 
з 
5 = 2nd decimal x 60 
= 01282 * 60 = 7692 
so DMS is 
24° 56 7.092" 
2.10: Examples for 


‘ad DD expressions of 


me 


Sphere 
R? «(хх)» (yy)? 
m 


2 


equator 
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‘Spherical vs. Ellipsoidal Earth 


While we often describe the Earth's 
shape as a sphere, it is better approximated 
as an ellipsoid. A sphere is a solid object 
‘defined by a center location and an equal 
radius in all directions. An ellipsoid is an 
approximately spherical solid, but with 
unequal radii along the axes. Spheroids and 
ellipsoids may be viewed in cross-section, 
revealing their difference in shape (Figure 2- 
11). The Earth's shape is best viewed as an 
ellipsoid flattened in the north-south direc- 
tion. This flattening is quite small, approxi- 
mately one part in 300. Translated to human 
Scales, this is about an 8 mm (130th of an 
inch) flattening in a basketball. While diffi- 
ult to observe directly, it is large enough to 
distort common geodetic measurements and 
navigation on the surface of the Earth. Many 
navigation and measurement estimates have 
two sets of formulas, one an approximation. 


based on a purely spherical globe, and a 
more complicated and precise set based on 
an ellipsoidal shape. 

Note that the words spheroid and ellip 
soid are often used interchangeably. GIS 
MENT 

a coordinate pro- 


or spheroid when 


Ellipse 


fattening 
foctor 
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jection and then lists a set of ellipsoids, with 
differing polar and equatorial radii. 

There have been several estimates of he 
“best” Earth radii, best in the sense that the 
average height of the solid carth surface is 
zero when taken across the globe. Eratosthe- 
nes of Cyrene provided a remarkably accu- 
rate estimate of the Earth's radius more than 
2.000 years ago, but improvements in tech- 
nologies allowed repeated, increasingly 
accurate measurements through the late- 
19005, when the current values were 
adopted. 

Today, the best estimate for a is 
6378.1370 meters (m), and for b 
63567523 ш. A mean value of 6.367 444.7 
m is often used for spheroids, but sometimes 
the value for a is adopted, 6.378,137 ш, or 
just 6378 km. 


Converting Arc to Surface Dis- 
tances 


At times we need to calculate the dis- 
tance on the surface of the Earth. For exam- 
ple, I might have two locations that differ by 
10 seconds of are, and wish to estimate the 
distance between them. We can approximate 
the surface distance on a circle or sphere by 
the formula: 


d-r.8 р 


‘where d is the approximate ground distance, 
ris the radius of the circle or sphere, and 0is 
the angle of the arc (Figure 2-12). There is a 
‘more complicated formula for ellipsoidal 
surfaces, but Equation 2-1 is acceptable for 
‘many instances where we need an approxi- 
‘mate distance, 


Figure 2-12 shows an example calcula- 
tion of arc length, using the average radius 
for Earth. Note that equation (2.1) applies to 
а generic arc angle, measured in the direc- 
tion of the spanned arc, without regard to the 
latitude longitude system. Substituting ati 
tude values will result in a reasonably accu- 
rate answer, but substituting longitude 
values anywhere but along the Equator will 
result in an error. This error will be small 
near the Equator and largest near the poles, 
due о longitudal convergence. The formula. 
is best used as a first approximation of dis- 
tance spanning generic ares, and not using 
longitudinal coordinates unless they re near 
the Equator, 

Note thatthe angle should be specified 
in radian measure, defined as 2л radians per 


СЕ 
here a measured n тойола 
1 rodon « 572957 
Given on Earth rodus of 6.378.137 m, how 
much distance is spanned by 107 of orc? 


‘Arce 107/3600 /1* = 000877778" 
-0.00277778'/57 2957 degrees per radon 
+ 0000048481435 radions 


= 6378137m « 0.000048481435 
= 3092 meters 

Figure 12: Example calculation of the approxi- 

mare surface distance spammed by ш at 


Converting degrees to radians: 
30.1487 degrees is 


30.1487 / 572957795 
= 0.52619 rodians 


Converting radians to degrees: 
1.284 radians is 


1.284 x 572957795 


= 73.5678 degrees 


Figure 113: Conversion between radian and 
degree angle шш 


the 360 degrees, or approximately 
57.2957795 degrees per radian. Radian mea- 
Sures are an alternative to degrees, and scale 
the rotation by the radius of the circle. You 
may easily convert between radian and 
degree units (Figure 2-13). Many spread- 
sheet, online, and app programs by default 
use radian measure, and substituting degrees 
will lead to errors. 


Great Circle Distance 
Spherical approximation 


Consider two ponts on the Earth's surfoce. 
A with latitude. longitude of (by ha). ond 


B with latitude, longitude of (фу. 34) 
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Also note that the arc angle between two 
locations is often not accurately determined 
by subtracting their geographic coordinates, 
due to the convergence of longitudes. This 
formula is generally accurate for latitudinal 
differences, but only closely approximates 
longitudinal distances near the Equator, 
because the longitude lines converge at the 
poles. As they converge, the difference in 
longitude becomes less than the arc angle Ө 
in the formula. 

The great circle distance formula should 
be used for the most accurate estimates of 
the surface distance between two points 
Figure 2-14). A great circle is defined by 
any plane that intersects а globe and passes 

its center. All meridians are great 
circles while the Equator is the only line of 
‘constant latitude that is also a great circle. A 
great circle distance is the shortest path on 
the Earth's surface between two points, and 
long-distance airline routes approximate 
great circles. As with all trigonometric for- 
mulas, you should know if your calculations 
‘expect degree or radian measures as input, 
‘and convert accordingly. 


great circle 


‘small circle 


The greot circle distance between points on a sphere 5 gen by the formula: 
dor 2 set er G5) сонд) соев) 
where d is the shortest distance on the surface of the Earth from A to В, 
ris the Earths radius, approximately 6378 km, ond 88.8 ore the differences 
between pont latitudes and longitudes. ded by two. 

Аз on example, the distance between Poris, France, and Seattle, USA, is: 

Latitude, longitude of Paris, France = 488647169, 2349049 

Latitude. longitude of Seattle, USA = 47.655548, 122 30320° 

d= 63782 sin (sin (0 604584))-созаа 264716) cos(47 555548) sit (62 36107) 
= 8,043 6556km 

Figure 4: Calelation of the great circle distance between pints. 
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Three-Dimensional, Earth-Cen- 
tered Coordinates. 

There is an altemate,three-dimensional 
(3-0) Cartesian representation of coordi- 
mates for locations, on or above Earth (Fig- 
ure 2-15), This is commonly used in 
_geodesy, the science of the Earth's shape, 
size, and physical dynamics that underpins 
all coordinate measures. Geodesy is at the 
heart of map projections (Chapter 3) and sat- 
elite positioning (Chapter 5), fundamental 
building blocks of GIS. 

‘The 3-D Cartesian system typically 
places the origin near or at the mass center of 
the Earth. This Cartesian system is aligned 
‘with the Z axis through the geographic North 
Pole and the X and Y axes forming a plane 
оп the Equator (Figure 2-16). The positive 
X-axis intersects the ellipsoid where latitude. 
‘and longitude values are both zero, and the 
fat Y-axis неа ба аро иа 

longitude of 90 and latitude of 0. 

Mathematical formulas allow us to cal- 

culate any x. Y, and Z given any latitude, 


3-D Cartesian Coordinate System 
z 


Figure 2-18: A D Cartesian coordinate sytem 
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longitude, and Earth radii (Figure 2-16), 
Each ia coe in 
the geographic sy 

X-Y-Z triplet in thes Daren conie 
system. These formulas are commonly used 
by geodesists in the most precise surveys, 
but are also embedded in many softwares 
that convert between diferent versions of 
ош coordinate data. 


There are two different sets of equations. 
for converting from geographic to 3-D Car- 
tesian coordinates, one assuming a spherical 
Earth, and а more accurate one assuming an 
ellipsoidal Earth. A detailed discussion of 
these is best left for an advanced course, so 
formulas are included in Appendix C for ref- 
erence. 


Geographic and Magnetic North 


There is often confusion between Mag- 
netic North and Geographic North, Mag- 
netic North and Geographic North do not. 
coincide (Figure 2-17). Magnetic North is 
the location towards which a compass 
points, The Geographic North Pole is the 
average northern location of the Earth's axis 
of rotation. If you were standing on the geo- 
graphic North Pole with а compass, it would 
point approximately in the direction of the 
Bering Straits. In addition, Magnetic North 
“wanders” through time, by 1005 of kilome- 
ters over the historical record, and has 
recently increased its rate of shift Figure 2- 
m 

Because Magnetic North and the geo- 
graphic North Pole are not in the same place, 
а compass does not point towards Geo- 
graphic North when observed from most 
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places on Earth. The compass will usually 
point east or west of Geographic North, 
defining an angular difference called the 
magnetic declination. Declination varies 
across the globe, and also has varied through 
time as Magnetic North wanders. 


Our geographic data are almost never 
referenced with respect to Magnetic North. 
Because Magnetic North wanders, is in a 
differen direction across the globe relative 
to constant lines of latitude and longitude, 
and can vary substantially due to local 
anomalies, it has been used historically for 
in-field navigation rather than as a reference 
against which coordinates are defined. Geo- 
graphic North is used because it is stable 
through time, predictable, can be reproduc- 
ibly measured, and tied to other Earth- 
defined constants, 


Figure 2.17 
жыш 
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Note that our definition of 
North is the average northern location of the 
Earth's axis of rotation. We say average 
because the Earth wobbles, or mutates, on its 
axis, This means the axis location varies 
slightly, within a circle about 9 meters (30 
feet) across, so ће northern pole location is 
always within this circle. The mutation has a 
period of 133 буз wih the pole retuming 
ack toits original location over that time. 


Attribute Data and Types 


Attribute data are used to record the 
‘non-spatial characteristics of an entity. Anri- 
butes, also called items or variables, may be 
‘envisioned as а list of characteristics that 
describe features, Color, depth, weight, 
‘owner, vegetation type, or land use are 
‘examples of variables that may appear as 
attributes, Atributes record values; for 
example, а fire hydrant may be colored red, 
yellow, or orange. have 1 to 4 flanges, and a 
pressure rating of any real number from 0 to 
12,000, 


How attributes are associated with spa- 
tial features depends ‘on the data 
‘model we choose, and often differ by the. 
phenomena we represent. We often represent 
attributes for continuous features (tempera 
ure, average annual rainfall) in a different 
manner than discrete features (property par- 
cels, cell towers). Common attribute storage 
methods are provided in the following sec- 
tion on data models, but we will preview the 
general outlines here. 

While there are many ways to store attri- 
‘bute data, two general conceptualizations are 
‘most common. In the first, attributes are 
often presented in tables and arranged in 
rows and columns (Figure 2-18). Each row 
corresponds to a spatial object, and each col- 
umn corresponds to an attribute. Tables are 
often organized and managed using a spe- 
cialized computer program called а database 
‘management system (DBMS, described 
‘more fully in Chapter 8). Data are often 
stored in a primary table that is closely 
linked to the spatial coordinates that also 
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characterize features, with additional tables. 
that may be linked to this primary table. This 
structure is common when we represent dis- 
crete features. 


Attributes for continuously changing 
variables are often conceptualized and stored 
more simply, as a collection of single values 
assigned to regularly-spaced locations, e.g.. 
elevation mapped on a grid (Figure 2-19). 
There may not be a table associated with the 
layer, but if one exists, the table often con- 
tains summary information about the layer, 
such as number of observations or average 
value of the variable, rather than an explicit 
linkage between a table entry and a specific 
‘geographic location. 

All attributes can be categorized as 
nominal. ordinal. oc interval ratio attributes. 
Nominal attributes are variables that provide 

information about an object. The 
color is recorded for each hydrant in Figure 
2-18. Other examples of nominal data are 
vegetation уре, a city name, the owner ofa 
parcel. or soil series There ino implied 

or quantitative information con- 
cl a amna anos 


Nominal attributes may also be images, 
film clips, audio recordings, or other 
descriptive information, for example, GIS 
for real estate often have images of the 
buildings as part of the database. Image, 
video, or sound recordings stored as atri 
butes are sometimes referred to as “BLOBS” 
for binary large objects. 

Ordinal attributes imply a ranking by 
their values. An ordinal atribute may be 
descriptive, such as high, mid, or low, or it 
may be numeric; for example, an erosion 
class with values from 1 to 10. The order 
reflects only rank, and not scale. An ordinal 
value of four has a higher rank than two, but 
we can't infer that the attribute value is twice 
as large. 

Inervalratio attributes ace used for 
numeric items where both rank order and 
absolute difference in magnitudes are repre- 
sented, for example, the number of flanges 
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in the second column of Figure 2-18. These 
data are often recorded as real numbers on a 
linear scale. Area, length, weight, height, or 
depth are a few examples of attributes that 
are represented by interval ratio variables. 

Tems have а domain, a range of values 
they may take. Colors might be restricted to. 
red, yellow. and green: cardinal direction to 
попі, south, east, or west; and size to all 
positive real numbers. 

The attribute type and domain define the 
kind of operations that may be appropriately 
applied that ibe. I usually makes fit- 

sense to calculate the average value of a 
nominal attribute, eg. the average name for 
э set of counties or average color of an 
image. Mis-applying functions meant for 
interval ratio attributes to nominal or ordinal 
attributes is among the most common errors 
in spatial data 


лде examples vete diesen it ey fom an an tice 
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We may perform inappropriate calcula- 
tions on interval ratio data, sometimes quite 
subtly so, Consider a highly accurate tem- 
perature measurement grid, with data saved 
to the nearest 0.00000001 degrees. The 
‘mathematical mode, the most frequent value. 
in the data set, may not be informative 
because high precision means that almost all 
values have very low frequency. Most tem- 


peratures may occur one or two times in the 
data, e.g. 22.00000000 and 22.00000001 
degrees are considered different. It may be 
more appropriate to bin the data into 0.10- 
degree intervals and then calculate the mode. 
Опе must bear in mind attribute data type, 
domain, and characteristics when manipulat- 
ing spatial data 
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Common Spatial Data Models 


All spatial data models are based on a 
conceptualization. As an example, consider 
а regional map that defines roads as lines. 
We conceive of each road аз а linear feature 
that fits into a small number of categories. 
These lines connect cities and towns that are 
shown as discrete points or polygons on the 
шар. Road properties may include ошу the 
road type, e.g, highway or local road. The 
roads have a width represented by a line 
symbol on the map; however, the scaled road 
‘width may not represent the true road width. 
All state highways are represented equally 
although they may vary. Some may have 
wide shoulders, others not, or dividing barri- 
ers of concrete, versus a broad vegetated 
median, but we may choose to omit this vari 
ation, fitting all highways into one class. 

There are two main 
used for digital spatial dia The first defines 
discrete objects using a vector data model. 
This model uses discrete elements such as 
points, lines, and polygons to represent the 
geometry of real-world entities (Figure 2-20, 
lef). 

Farm fields, ronds, wetlands cities, and 
census tracts are examples of entities that are 
often represented by discrete vector objects. 
Points are often used to define the locations 


Vector 


„ Points 


‘of small” objects such as wells, buildings, 
or ponds. Lines may be used to represent lin- 
езг objects, for example. rivers or roads, ог 
to enclose polygons, which identify area 
objects. Starting points and ending points for 
a line are sometimes refered to as nodes, 
while intermediate points in a line are 
referred toas vertices. 

‘Vector objects are discrete. А forest may 
share an edge with a pasture, and this bound- 
ary is represented by lines. In truth, а forest 
‘edge may grade into a mix of trees and 
shrubs, then shrubs and grass, then pure 
grass, however, in the vector conceptualiza- 
tion, а line between two land cover types 
will be drawn to indicate а discrete, abrupt 
transition. Lines and points have coordinate 
locations, but points have no dimension, and 
lines have no dimension perpendicular to 
their direction. Area features are defined by 
a closed, connected set of lines. 


The second common conceptualization 
identifies and represents grid cells for a 
given region of interest. This conceptualiza- 
tion employs a raster data model (Figure 2- 
20, right) Raster cells are arrayed in a row 
and column pattem to provide “wal-to- 
wall” coverage ofa study region. Cell values 
are used to represent the type or quality of 


Raster 
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mapped variables. Raster models are often 
used with variables that may change contin- 
тошу across a region. Elevation, mean tem- 
perature, slope, average rainfall, cumulative 
‘ozone exposure, or soil moisture are exam- 
ples of phenomena that are often represented 
аз continuous fields. Raster representations. 
are also sometimes used to represent discrete. 
features, for example, class maps of vegeta- 
tion or political units. 


Data models are often interchangeable 
in that many phenomena may be represented 
by many data models. For example, eleva- 
tion may be represented as а raster surface 
(continuous field) or as а series of lines rep- 
resenting contours of equal elevation (dis- 
‘rete objects), Data may be converted from. 
‘one model to another. for example, the loca- 
tion of contour lines may be determined by 
‘evaluating the raster surface, or а raster data 
layer may be derived from a set of contour 
lines. These conversions entail some costs 
‘both computationally and perhaps in data 
accuracy. 

Ты акаа юше йыга гын or 
‘vector model often depends on our concep- 
alcun o e obec mde most he 
quent operations performed. We think of 
elevation as a continuous variable, and slope 
is more easily determined when elevation is 
represented in a raster data set. However, 
discrete contours are often the preferred for- 
mat for printed maps, so the discrete concep- 
ualization of a vector data model may be 
preferred in some cases. The best data model 
fora given application depends on the most 
‘common operations, the experiences and. 
‘views of the GIS users, the form of available. 
data, and the influence ofthe data model on 
data quality. 

Other, less common data models are 
sometimes used. А triangulated irregular 
network (TIN) is one such model, employed 
to represent surfaces such as elevations, 
through a combination of point, line, and 
area features. We will introduce and discuss 
less common data models later in this chap- 
ter 


Vector Data Models. 


A vector data model uses sets of coordi- 
nates and associated attribute data to define 
discrete objects. Groups of coordinates 
define the location and boundaries of dis- 
crete objects, and these coordinate data plus. 
their associated attributes are used to create 
Vector objects representing the real-world 
entities (Figure 2-21), In the most common 
‘vector models, there is an attribute table 
associated with each vector layer, and a sin- 
gle row in the table corresponding to each 
feature in the data layer. These vector layers 
are said to contain single-part features, 
because there is a single geographic object 
foreach row in the table, with one to several 
columns in each row. Ali values in a column 
have the same type, so for any given column, 
all entries might be ordinal, or interval ratio, 
ога BLOB, or some other defined type. An 
identifier vale, or ID, is typically included, 
and this value is often unique within the 
table, with an unrepeated value assigned for 
each row and corresponding feature. 

There are three basic types of vector. 
objects: points, lines, and polygons (Figure 
2-21). A point uses a single coordinate pair 
to represent the location of an entity that is 
considered to have no dimension. Gas wells, 
light poles, accident location, and survey 
points are examples of entities often repre- 
sented as point objects. Some of these have. 
‘eal physical dimension, but for the purposes 
of the GIS users, they may be represented as 
points. In effect, this means the size ог 
dimension of the entity is not important, 
only its location. 

Attribute data are attached to each point, 
and these attribute data record the important 
‘non-spatial characteristics of the point enti- 
ties (Figure 2-21, top). When using a point to 
representa light pole, important attribute 
information might be the height of the pole, 
the type of light and power source, and the 
last date the pole was serviced. 

Linear features are represented as lines 
in vector data models (Figure 2-21, mid) 
Lines are most often represented as an 
ordered set of coordinate pairs. Each line is 
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Figure 221: An example ofthe шом common vector data model structures 
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made up of line segments that run between 
ифжен coordina la he coded к. 
Attributes in a table correspond to line seg- 
ment. Curved linear entities are most often 
represented as а collection of short, straight, 
line segments, although curved lines are at 
times represented by a mathematical equa- 
tion describing a geometric shape. The line 
starting and ending points are often called 
nodes, and intermediate points used to repre- 
sent the line shape are called vertices. 


Area entities are most often represented 
by closed polygons (Figure 2-21, bottom). 


These polygons are formed by a set of con- 
nected lines, either one line with an ending 
point that connects back to the starting point, 
‘or as a set of lines connected start-to-end. 
Polygons have an interior region and may 
entirely enclose other polygons in this 
region, Polygons may be adjacent to other 
polygons and thus share “bordering” or 
"edge" lines with other polygons. Attribute 
data such as area, perimeter, land cover type, 
‘or county name may be linked to each poly- 
gon. 
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Note that there is no uniformly superior. 
‘way to represent features, and we may repre- 
sent the same features as points, lines, or 
polygons (Figure 2-22). Some feature types 
‘may appear to be more "naturally" repre- 
sented one way: manhole covers as points, 
roads as lines, and parks as polygons. How- 
ever, in a very detailed data set, the manhole 
covers may be represented as circles, and 
both edges of the roads may be drawn and 
the roads represented as polygons. The best 
representation depends on the detail, accu- 
racy, and intended use of the data set. 
Vector layers sometimes have a many- 
to-one relationship between geographic fea- 
tures and table rows (Figure 2-23), defining 
multipart features. In these instances, many 


Multiple Representations: 
Buildings as point, line, or 
area features in 
a data layer 


spatially distinct features are matched with a 
Tow, and the row attributes apply to all the. 
distinct features. This is common when rep- 
resenting islands, groups of buildings, ог 
other clusters of features that make up a per- 
ceived whole thing. These multi-part fea- 
tues better represent our perceptions of 
entities in some contexts. 

Multi-part features may also be used for 
large data sets, for example, when millions 
of point observations are collected automati- 
cally with laser scanners. Tables are often 
slower to process than point geographies, 
and so reducing the tble size by grouping 
points into multi-part features may shorten 
‘many operations 


points 
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Single-port Features 
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Care is warranted when converting 
multi-part feanres to single-part features. 
The most common problems arise for aggre- 
gate variables in polygon layers, such as 
total counts. For example, population data 
are often delivered by census areas such as 
states. Many states, e.g., Hawaii, have sev- 
eral parts and are represented by a multi-part 
shape. The population is associated with the 
aggregated set of polygons comprising the 
state (Figure 2-24). When converted to sin- 
gle-part shapes, the attributes are often cop- 

for each component polygon. In our 
example, all single-part polygons will be 
assigned the atribute values for the multi- 
part feature, in effect repeating counts for 
‘each part. Subsequent aggregation or calcu- 
lation across the population column may 
result in error. 

Attributes for converted shapes may be 
corrected. If component data are available. 
they can be assigned to each of the single- 
part features, If not, then some weighting 
Scheme may be available, for example, if 
there is a correlation between area and 
count. Until they are reviewed and appropri- 
ately adjusted. single-part attributes derived 
from multi-part features should be used with 
caution. 


Polygon Inclusions and Bound- 
ary Generalization 


Vector data frequently exhibit two char- 
acteristics: polygon inclusions and boundary 
generalization. These characteristics are ofi- 
ignored, but may affect the use of vector 
data. These concepts must be understood, 
their presence evaluated, and effects 
‘weighed in the use of vector data sets. 


Polygon inclusions are areas in a poly- 
gon that are different from the rest of the 
but still part of it. Inclusions occur 
зе we typically assume an area repre- 
sented by a polygon is homogeneous, but 
this is often untrue, as illustrated in Figure 2- 
23. The figure shows a vector polygon layer 
representing raised landscaping beds (о). 
The general attributes for the polygon may 
be coded: for example, the surface type may 
be recorded as cedar mulch, The area noted 
in Figure 2-25b shows a walkway that is an 
inclusion in а raised bed. This walkway has 
a concrete surface. Hence, this walkway is 
an unresolved inclusion within the polygon. 
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‘One solution creates a polygon for each 
inclusion. This often is not done because it 
may take too much effort to identify and col- 
lect the boundary location of each inclusion, 
and there typically is some lower limit, or 
minimum mapping unit, on the size of 
objects we care to record in our data. Inchi- 
sions are present in some form in many poly- 
gon data layers. 

Boundary generalization is the incom- 
plete representation of boundary locations. 
‘This problem stems from the typical way we 
represent linear and area features in vector. 
data sets. As shown in Figure 2-25с, poly- 
gon boundaries are represented asa set of 
Connected straight-line segments. The seg- 
ments are a means to trace the boundaries 
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separating different area features. For curved 
lines, these straight line segments may be 
viewed as a sampling of the true curve, and 
there is typically some deviation of the line 
segment from the "true" curved boundary. 
The amount of generalization depends on 
many factors, and should be so small as to be 
unimportant for any intended use of the spa- 
tial data. However, since many data sets may 
have unforeseen uses or may be obtained 
froma third party. the boundary generaliza. 
tion sbould be recognized and evaluated rel- 
ative to the specific requirements of any 
given spatial analysis, There are additional 
forms of generalization in spatial data, and 
these are described more thoroughly in 
Chapter 4. 
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Vector Topology 


‘Vector data often contain vector topol- 
‘ogy: enforcing strict connectivity and record- 
ing adjacency, and planarity. Early systems 
employed a spaghetti data model (Figure 2- 
260), in which lines may not intersect when 
they should, and may overlap without con- 
necting. The spaghetti model severely limits 
spatial data analysis and is litle used except 
for very basic data entry or translation. 
‘Topological models (Figure 2-260) create an 
intersection and place a node at each line. 
crossing. record connectivity and adjacency, 
and maintain information on the relation- 
ships between and among points, lines, and 
polygons in spatial data. This greatly 
improves the speed, accuracy, and utility of 
‘many spatial data operations, 

‘Topological properties are conserved 
‘when converting vector data among com- 
‘mon coordinate systems, a common practice 
in GIS analysis (described in Chapter 3). 
Polygon adjacency is an example of a topo- 
logically invariant property, because the list 
of neighbors for any given polygon does not 
Chang during geoneric arching or bendt- 

ing (Figure 2-26, b and c). These relation- 
spa nay be recorded separately Bom е 
coordinate data. 


a) spaghetti 


b) topological 


‘Topological vector models may vary, 
and enforce particular types of topological. 


face. There can be no overlaps among lines 
or polygons in the same layer (Figure 2-27). 
‘When planar topology is enforced, lines may 
not cross over or under other ines. At each 
line crossing there must be an intersection. 
The left side of Figure 2-27 shows non- 
planar graphs. In the top left figure, four line 
segments coincide. At some locations the 
lines intersect at a node, shown as white- 
filled circles, but at some locations a line 
passes over or under another line segment, 
These lines are nonplanar, The top right of 
Figure 2-27 shows planar topology enforced 
for these same four line segments. Nodes are 
found at each line crossing. 
Polygons can also be non-planar, as 
shown tthe botom lef of Figure 237. Two 
polygons отер slighty и LI rl 
may be due to an error: the two 
polygons stare a boundary but have bee 
recorded with an overlap, of there m 
тко areas that overlap in some way. If t0po- 
logical planarity is enforced, these two poly- 
gons must be resolved into three separate, 
non-overlapping polygons. Nodes are placed 
at the intersections of the polygon boundar- 
ies (lower right, Figure 2-27). 
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Figure 2-27: Nonplanar and planar topology in lines and polygons 


There are additional topological con- 
set besides planarity that may be speci- 
ied. For example, polygons may be 

exhaustive, in that there are no gaps, holes, 
or “islands” allowed. Line direction may be 
recorded, so that a "from" and "to" node are 
identified in each line. Directionality aids 
the representation of river or street networks, 
Where there may bea natural flow direction. 

There is no uniform set of topological 
relationships that are included in all topolog- 
ical data models. Different vendors have 
incorporated different topological nforma- 
tion in their data structures. Planar topology 
is often included, as are representations of 
adjacency (which polygons are next to 
which) and connectivity (which lines con- 
nect to which). 


‘Some GIS software create and maintain 
detailed topological relationships in their 
data. This results in more complex and per- 
haps larger data structures, but access is 
often faster, and topology provides more 


‘consistent, “cleaner” data. Other systems 
‘maintain little topological information in the 
data structures, but compute and act upon 
topology as needed during specific process- 
ing. 

Topology may also be specified between. 
layers, because we may wish to enforce spa- 
tial relationships between entities that are. 
stored separately. As an example, consider a 
data layer that stores property lines (cadas- 
tral data), and a housing data layer that 
stores building (Figure 2-28), 
Roles may be specified that prevent poly- 
gons in the housing data layer from crossing 
property lines in the cadastral data layer. 
‘This would indicate a building that crosses 
property line. Most such instances occur asa 
result of small errors ín data entry or mis- 
alignment among data layers. Topological 
restrictions between two data layers avoid 
these inconsistencies. Exceptions may be 
granted in those few cases when a building. 
truly does cross property lines. 
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There ме many other types of topologi- 
cal constraints that may be enforced, both 
within and between layers. Dangles, lines 
that do not connect to other lines, may be 
proscribed, or limited to be greater or less 
than some threshold length. Lines and points 
may be required to coincide, for example, 
water pumps as points in one data layer and 
‘water pipes as lines in another, o lines in 
separate layers may be required to intersect 
‘or be coincident. While these topological 
rules add complexity to vector data sets. they 
may also improve the logical consistency 
and value of these data 

‘Topological vector models often use. 
codes and tables to record topology. As 
described above, nodes are the starting and 
ending points of lines. Each node and line is 
given a unique identifier. Sequences of 
‘nodes and lines are recorded аз a list of iden- 
бег, and point, line, and polygon topology 
recorded in a set of tables. The vector fea- 
‘tures and tables in Figure 2-29 illustrate one 
form of this topological coding. 


Many GIS software systems are written 
such that the topological coding is not visi- 
‘ble to users, nor directly accessible by them. 
Tools are provided to ensure the topology is 
created and maintained, that is, there may be 
directives that require that polygons in vo 


layers do not overlap. or to ensure planarity 
forall line crossings. However, the topologi- 
cal tables these commands build are often 
quite large, complex, and linked in an 
obscure way, and therefore hidden from 
users. 


Point topology is often quite simple, 
Points are typically independent of each 
other, so they may be recorded as individual 
identifiers, perhaps with coordinates 
included, and in по particular order (Figure 
2-29, top). 

Line topology typically includes sub- 
stantial structure and identifies at a mini- 
mum the beginning and ending points of 
each line (Figure 2-29, middle), Topology 
may be organized in tables, including line 
identifiers, starting nodes, and ending nodes 
for lines. Lines may be assigned a direction, 
and the polygons to the left and right of the 
lines recorded. 


Polygon topology may also be defined 
by tables (Figure 2-29, bottom). The tables 
may record the polygon identifiers and the 
ordered list of connected lines that define the 
polygon. The lines for a polygon form а 
closed loop, зо the starting node of the first 
line in the list also serves as the ending node 
for the last line in the list. 

Topological models greatly enhance 
many vector operations. Adjacency analy- 
ses are reduced to a "table look-up", a quick 
and easy operation in most software sys- 
ems. Assume the city is represented as a 
single polygon. and we seek all neighboring 
polygons. Adjacency analysis reduces to 1) 
Scanning the polygon topology table to find 
the city polygon and reading the list of lines 
that bound the polygon, and 2) scanning this 
list of lines, accumulating a list of all left and 
right polygons. Polygons adjacent to the city 
may be identified from this list. List searches 
on tables are typically much 
faster than searches involving coordinate 
data. 


‘Topological data models often have an 
advantage of smaller file sizes, Largely 
because coordinate data are recorded once. 
For example, а nontopological approach 


An example of vector features and. 
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often stores polygon boundaries twice. Lines 
$2 and 53 at the bottom of Figure 2-29 will 
be recorded for both polygon А and polygon 
В. Long, complex boundaries in polygon 
data sets may double their size. This 
increases both storage requirements and pro- 
cessing. 

‘There are limitations and disadvantages 
to topological vector models. First, there are 
computational costs in defining the topologi- 
cal structure ofa vector data layer. Software 
must determine the connectivity and adja- 
cency information, assign codes, and build 
the topological tables. Computational costs 
are typically quite modest with current com- 
puter technologies. 

Second, the data must be very “clean.” 
in that all lines must begin and end with a 
node, all lines must connect correctly. and all 


polygons must be closed. Unconnected lines 
‘or unclosed polygons will cause errors 
during analyses. Significant human effort 
may be required to ensure clean vector dala 
because each line and polygon must be. 
‘checked. Software may help by flagging or 
fixing “dangling” nodes that do not connect 
to other nodes, and by automatically identi- 
fying all polygons. Each dangling node and 
polygon may then be checked, and edited as 
needed to correct errors. 
Limitations and the extra editing are far 
‘outweighed by the gains in efficiency and 
analytical capabilities provided by topologi- 
‘al vector models. Many current vector GIS 


packages use topological vector models in 
some form. 
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Vector Features, Tables, and 
Structures 


As described earlier. geographic features 
are associated with nonspatial attributes in 
vector models: tables are used to organize 
the attributes, In most GIS software, we can 
‘most easily view the tables and a graphic 
representation of the spatial data as a linked 
table and digital map (Figure 2-30, top). 


Most GIS employ underlying file struc- 
tures to organize components of the spatial 
data. An example organization is shown in 
the bottom half of Figure 2-30, where the 
topological elements are recorded in а linked 
set of tables, in this example one for each of 
the polygons, lines and nodes and vertices. 
Most GIS maintain the spatial and topologi- 
cal data asa single or cluster of linked files. 
‘This intemal file structure is often insulated 
from direct manipulation by the GIS user, 
‘but underlies nearly all spatial data manipu- 

lions A user may directly edit or otherwise 


manipulate table values, usually with the 
exception of the ID. and the underlying 
topology and coordinate data are accessed 
via requests to display, change, or analyze 
the spatial data Data layers 
may also include additional information (not 
shown) on the origin, region covered, date of 
creation, edit history, coordinate system, Or 
other characteristics of a data set. 


Note that not all GIS store coordinate 
and topological data in non-tabular file 
structures. Coordinates, points, lines, poly- 
gons, and other composite features may be 
sored in tables similar to attribute tables. It 
is premature to discuss the details of these 
spatially enabled databases, because they 
are based on something called a relational. 
data model, described in detail in Chapter 8. 
Faster computers support this generally 
more flexible approach, allowing simpler 
and more transparent access across different 
types of GIS software. 
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Raster Data Models 
Models and Cells 


Raster data models define the world as a 
regular set of cells in a grid pattern (Figure 
2-31), Typically, these cells are square and 
evenly spaced in the X and Y directions. The 
phenomena or entities of interest are repre- 
sented by attribute values associated with 
each cell location, 

Raster data models are the natural 
means to represent “continuous” spatial fea- 
tures or phenomena. Elevation, precipita- 
tion, slope, and pollutant concentration are 
examples of continuous spatial variables. 
These variables characteristically show sig- 
nificant changes in value over broad areas. 
The grades can be quie steep (cg t 
cliffs), gentle (long. sloping ridges). or quite 
‘arabe (rolling i), Raster dia models 
depict these gradients by changes in the val- 
ues associated with each cell 

Raster data sets have a cell dimension, 
defining the edge length for each square cell 
(Figure 2-31), For example, the cell dimen- 
sion may be specified as a square 30 meters 
оп each side. The cells are usually oriented 
parallel to the X and Y directions, and the 
coordinates of a comer location аге speci- 
fied. 
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When the cells are square and aligned 
‘with the coordinate axes, the calculation of a 
cell location is a simple process of counting 
‘and multiplication. A cell location may be 
calculated from the cell size, known comer 
‘coordinates, and сей row and column num- 
ber. For example, if we know the lower-left 
сей coordinate, ай other cell coordinates. 
may be determined by the formulas: 


Маа = Моля * row * cell size — (22) 


Eees = Eoen * column * cell size (23) 


where N is the coordinate in the north direc- 
tion у), E is the coordinate in the east direc- 
tion (x), and the row and column are counted 
starting with zero from the lower left cell. 

There is often a trade-off between spa- 
tial detail and data volume in raster data sets. 
The number of cells needed to cover a given 
area increases four times when the cell size 
is cut in half (Figure 2-32). Smaller cells 
provide greater spatial detail, but at the cost 
‘of larger data sets 

‘The cell dimension also affects the spa- 
tial precision of the data set, and hence posi- 
tional accuracy. The cell coordinate is 
usually defined at a point in the center of the 
‘cell. The coordinate applies to the entire area 
‘covered by the cell. Positional accuracy is 
typically expected to be no better than 

‘one-half the cell size. No 

matter the true location of a feature, coordi- 
nates are truncated or rounded up to the 
nearest cell center coordinate. Thus, the cell 
size should be no more than twice the 
desired accuracy and precision forthe data 
layer represented in the raster, and often it is 
specified to be smaller. 

Each raster cell represents a given area 
‘on the ground and is assigned a value that 
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Figure 2-32: The numberof cells in a raster data set depends on the cell size. For a given aren, a linear 
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‘be considered to apply to the entire cell 
Ifthe variable is uniform across the raster 
cell the value will be correct over the cell. 
However, under most conditions there is 
‘within-cell variation, and the raster cell 
value represents the average, central, or most 
common value found in the cell. Consider а 
Taster data set representing average weekly 
income with a cell dimension that is 300 
meters (980 feet) on a side. Further suppose 
"hat there В a raster cell with a value of 710 
pesos. The entire 300 by 300 meters area is 
considered to have this value of 710 pesos 
per week. There may be many households 
‘within the raster cell that do not eam exactly 
710 pesos per week. However, the 710 pesos 
‘may be the average, the highest point, or 
зоте other representative value for the area. 
covered by the cell. While raster cells often 
represent the average or the value measured 
at the center of the cell, they may also repre- 
sent the median, maximum, or another statis- 
tic for the cell area. 


‘An alternative interpretation ofthe ras- 
ter cell applies the value to the central point 
of the cell. Consider a raster grid containing 
elevation values. Cells may be specified as 
200 meters square, and an elevation value 
assigned to each square. A cell with a valve 
‘of 8.000 meters (26.200 feet) may be 
assumed to have that value at the center of 
the cell, but this value will not be assumed to 
apply to the entire cell. 


exponential атаме in cell mambar, eg. halving ie cell ize causes a 


A raster data model may also be used to 
represent discrete data (Figure 2-33), for 
‘example, to represent land cover in an area, 
Raster cells typically hold numeric or single- 
letter al characters A coding 
scheme defines what land cover type the dis- 
crete values signify. Each code may be 
found at many raster cells. 

Raster cell values may be assigned and 
interpreted in at least seven different ways 
(Table 2-1). We have described three: a ras- 
ter сей as a point physical value (elevation), 
asa statistical value (average income), and 
as discrete data (land cover) Raster values 
may also be used to represent points and 
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Table 2-1: Types of data represented by raster cell values. 


Data Type Description Example 
point alpha-numeric ID of closest point hospital 
tine ‘alpha-numeric ID of closest ine nearest road 
contiguous. ‘alpha-numeric 10 for dominant region. siate 
region 10 
ass code alpha-numer code for general cass vegetation ype 
abe ID mumeri postion in a table Tow 
Physical analog | numeric vale representing surface valve | elevation 
завіса маме | numeric vake from a statistical function population 
density 
lines, as the IDs of lines or points that occur butes must be assigned (Figure 2-34 €). 


closest to the cell center. 


Point and line assignment to raster cells 
may be complicated when there are multiple 
features within a single cell. For example. 
cell value assignment is straightforward 
‘when there is only one light pole in a cell 
(Figure 2-34, near A). When there are multi- 
ple poles in a single cell there is some ambi- 
войу. ог generalization in the assignment 
(Figure 2-34, near B). One common solution 
represents one feature from the group, and 
retains information on the attributes and 
characteristics of that feature. This entails 
some data loss, Another solution is to reduce. 
the raster cell size so that there are no multi- 
ple features in a сей. This may result in 
impractically large data ses. More complex 
schemes may record multiple instances of 
features in a сей, but these then may slow 
access or otherwise decrease the utility that 
comes from the simple raster structure, 


Similar problems may occur when there. 
are multiple line segments within a raster 
cell for example, when linear features such 
as roads are represented in a raster data set. 
‘When two or more roads meet, they will do 
зо within a raster cell, and some set of atri 


Since attributes are assigned by cells, some 
[зке must be established, wih one 
assigned a higher priority. 

Raster cell assignment also may be com- 
plicated when representing what we typi- 
cally think of as discrete, uniform areas. 
‘Consider the area in Figure 2-35, We wish to 
represent this area with a raster data layer, 
with cells assigned to one of two class codes, 
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опе each for land or water. Water bodies 
appear as darker areas in the image, and the. 
‘ater pid is shown overlain. Cells may con- 
tain substantial areas of both land and water, 
and the proportion of each class may span 
from zero to 100 percent. Some cells are 
purely one class and the assignment is unam- 
biguous: for example, the cell labelled A in 
"he Figure 2-35 contains only land. Others 
are ambiguous, such as cell В (water) or O 
(land). Some are nearly equal in their pro- 
portion of land and water, as in сей C. 

Опе common method to assign classes 
for mixed cells is called “winner-take-all” 
The cell is assigned the class of the largest 
area type. Cells A. С. and D would be 
assigned the land class, cell В the water 
class, Another option applies preference in 
cell assignment. If any of an “important” 
type is found, then the cell is assigned that 
value, regardless of the proportion. If we 
specify a preference for water, беп cells В, 
C, and D in Figure 2-35 would be assigned 
the water type, and cell А the land type. 


Regardless of the assignment method 
used, Figure 2-35 illustrates two phenomena. 
‘when discrete objects are represented using a 
raster data model. First, some areas that are 
not the assigned class are included in some 
raster cells. These “inclusions” are inevita- 


ble because cells must be assigned to a dis- 
crete class, the cell boundaries are rigidly 
placed, andthe class boundaries on the 
ground rarely line up with the cell boundar- 
des. Some mixed cells occur in nearly all ras- 
ter layers. The GIS user must acknowledge 
these inclusions, and consider their impact 
оп the intended spatial analyses. 

Second, differences in class assignment 
rules may substantially alter the data layer, 
as shown in our simple example. In more 
‘complex landscapes, there will be more 
potential cell types, which may increase the 
assignment sensitivity. Decreasing the raster 
cel size reduces the significance of classes 
їп the assignment rule, but at the cost of 
increased data volumes. 


Raster Features and Attribute 
Tables 

Raster layers may also have associated 
attribute tables. This is most common when 
nominal data are represented, but may also 
е сей with ordinal or inerval/ratio data, 
Just as with topological vector data, features 
in tbe raster layer may be linked to rows in 
an attribute table, and these rows may 
describe the essential nonspatial characteris- 
tics of the features. 

Figure 2-360 and b show data repre- 
sented ina raster model. Figure 2-360 shows 
а raster data set that maintains a one-to-one 
relationship between raster cells and in the 
data table. An additional column, се!-10, 
must be added to uniquely identify each ras- 
ter location. The corresponding attributes 
IDorg, coss, and area are repeated for each 
cell. Note that the area values are the same 
for all cells and thus all rows in the table. 

А one-to-one correspondence is rarely 
used with raster data sets because it often 
‘would require an unmanageably large size of 
attribute table. This small example results in 
100 rows for the attribute table, but we often 
use raster data sets with billions of cells. If 
же insist on a one-to-one cell/artribute rela- 
tionship. the table may become too large. 
Even simple processes such as sorting, 


searching, or subsetting records become pro- 
hibitively time consuming. Display and 
redraw rates become low, reducing the util- 
ity of these data, and decreasing the likeli- 
hood that GIS will be effectively applied. 


Toavoid these problems, a many-to-one 
relationship is usually allowed between the 
raster cells and the attribute table (Figure 2- 
36b). Many raster cells may refer to a single 
Tow in the attribute column. This substan- 
tially reduces the size of the attribute table 
for most data sets, although it does so at the 
cost of some spatial ambiguity. There may 
be multiple, noncontiguous patches for a 
specific type. For example, the upper left 
and lower right portion of the raster data set 
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in Figure 2-360 are both of class 10. Both 
are recognized as distinct features in the vec- 
tor and one-to-one raster representation, but 
are represented by the same attribute entry in 
the many-to-one raster representation. This 
reduces the size of the attribute table, but at 
the cost of reducing its flexibility. Many-to- 
‘one relationships effectively create multi- 
part areas. The data for the represented vari- 
able may be summarized by class; however, 
these classes may or may not be spatially 
contiguous. 

‘An altemative is to maintain the one-to- 
‘one relationship, but to index all the raster 
cells in a contiguous group. thereby reducing 
the number of rows in the attribute table. 


attribute table 
(cell 1 is upper-ieft corner) 
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This requires software to develop and main- 
tain the indices, and to create them and 
reconstitute the indexing after spatial opera- 
tions. These indexing schemes add overhead 
and increase data model complesity thereby 
removing one of the advantages of raster 
data sets over vector data sets. 


A Comparison of Raster and Vec- 
tor Data Models 


‘The question often arses, “Which are 
better, raster or vector data models?" The 
answer is neither and both, Neither data 
model is better in all conditions or for all 
data, Both have advantages and disadvan- 
ages relative to each other and to additional 
‘more complex data models (Table 2-2). In 
some instances it is preferable to maintain 
data in a raster model, and in others in a vec- 
or model. Most data may be represented in 
both, and may be converted among data 
models. As an example, land cover may be 
represented as а set of polygons in a vector 
data model or as a set of identifiers for cells 
in a raster grid. The choice often depends on 
‘a number of factors, including the predomi- 


nant type of data (discrete or continuous), 
the expected types of analyses, available 
storage, the main sources of input data, and 
the expertise of the human operators, 

Raster data models exhibit several 
advantages relative to vector data models. 
First, raster data models are particularly suit- 
able for representing themes or phenomena 
that change frequently in space. Each raster 
cell may contain а value different than its 
neighbors. Thus, rends as well as more 
rapid variability may be represented. 

Raster data structures are generally sim- 
pler than vector data models, particularly 
‘when a fixed cell size is used, Most raster 
models store cells as sets of rows, with cells 
organized from left to right, and rows stored 
from top to bottom, This organization is 
quite easy to code in an array structure in 
‘most computer languages. 

Raster data models are often faster in 
layer overlays. А raster cell corresponds to a 
given location. Data indifferent layers align 
cell-to-cell over this location. Overlay 
involves locating a grid cell and comparing 
the values found in other layers for the same 


Table 2-2: A comparison of raster and vector data models. 


Characteristic. Raster. Vector 
dala structure ‘usualy simple usually complex 
storage require- larger for most data sets wiih- smaller for most data 
ments out compression sets 
coordinate conver- may be slow due to data vol- simple 
sion umes, and require resampling 
‘analysis easy for continuous data, sim- preferred for network 
ре for many layer combinations | analyses many other 
spatial operations more 
complex 
‘patil precision ‘oor set by cel size limited only by posi- 
tional measurements 
accessibily easy to modify or program, due often complex 
to simple data structure 
display and output good for images, but discrete maple, with continu- 
features may show “stairstep" ооз curves, poor for 
edges. images 


location. This cell look-up is quite rapid in 
most raster data structures. 


Finally. raster data structures are the 
most practical method for storing. display- 
ing, and manipulating digital image data, 
Such as aerial photographs and satellite 
imagery. Digital image data are an important 
source of information when building. view- 
ing, and analyzing spatial databases. Image 
display and analysis are based on raster 
operations to sharpen details on the image, 
specify the brightness, contrast, and colors 
for display, and to aid in the extraction of 
information. 


Vector data models provide some advan- 
tages relative to raster data models. First, 
vector models often lead to more compact 
data storage, particularly for discrete objects. 
Large homogeneous regions are recorded by 
the coordinate boundaries in а vector data 
model. These regions are recorded as а set of 
cells in a raster data model. The perimeter 
grows more slowly than the area for most 
feature shapes, so the amount of data 
required to represent an area increases much 
more rapidly with a raster data model, Vec- 
tor data are much more compact than raster 
data for most themes and levels of spatial 
deni 

Vector data are a more natural means for. 
representing networks and other connected 
linear features. Vector data by their nature 
store information on intersections (nodes) 
and the linkages between them (lines). Traf- 
fic volume, speed, timing. and other factors 
may be associated with lines and intersec- 
tions to model many kinds of networks. 

Vector data models are easily presented 
in a preferred map format. Humans are 
familiar with continuous line and rounded 
curve representations in hand- or machine- 
drawn maps. Raster data often show a “stair- 
step” edge for curved boundaries, particu- 
larly when the cell resolution is large relative 
to the resolution at which the raster is dis- 
played. Vector data may be plotted with 
more visually appealing continuous lines 
and rounded edges. 
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‘Vector data models ease the calculation 
and storage of topological information. 
Topological information aids in performing 
adjacency, connectivity, and other analyses 
in an efficent manner. Topological informa- 
tion also allows some forms of automated 
error and ambiguity detection, leading to 
improved data quality. 


Conversion Between Raster and 
Vector Models 


‘Spatial data may be converted between 
raster and vector data models. Vectorto-as- 
ter conversion involves assigning а сей 
value foreach position occupied by vector 
features. Vector point features are typically 
assumed to have no dimension. Points in a 
raster data set must be represented by a value 
in a raster cell, so points have at least the 
dimension of the raster cell after conversion 
from vectorto-raster models. Points are usu- 
ally assigned to the cell containing the point 
‘coordinate. The сей їп which the point 
resides is given a number or other code iden- 
tifying the point feature occurring at the cell 
location. If the сей size is too large, two or 
more vector points may fall in the sume cell, 
and either an ambiguous cell identifier 
assigned, or a more complex numbering and 
assignment scheme implemented. Typically, 
a cell size is chosen such that the diagonal 
‘cell dimension is smaller than the distance 
between the two closest point features. 
‘Vector line features in a data layer may 
also be converted to a raster data model, 
Raster cells may be coded using differen 
teria. One simple method assigns a value 
toa cell ifa vector line intersects with any 
part of the cell (Figure 2-376, lefi). This. 
ensures the maintenance of connected lines 
in the raster form of the data. This assign- 
ment rule often leads to wider than appropri- 
ate lines because several adjacent cells may 
beassigned as part of the line, particularly 
when the line meanders near cell edges. 
Other assignment rules may be applied, for 
‘example, assigning a cell as occupied by a 
line only when the cell center is near a vector 
line segment (Figure 2-370, right). "Near" 
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may be defined as some sub-cell distance, 

for instance, 1/3 the сей width, Lines pass- 
ing through the comer of a cell will not be 

recorded as in the cell This may lead to thin- 
ner linear features in the raster data set, but 
often at the cost of line discontinuities. 

The output from vector-to-raster conver- 
ion depends on the algorithm used, even. 
though you use the same input. This brings 
‘up an important point to remember when 
applying any spatial operation. The output. 
often depends in subtle ways on the spatial 
‘operation. What appear to be quite small dif- 
ferences in the algorithm or key defining 
parameters may lead to quite different 
results, The ease of spatial manipulation in a 
GIS provides a powerful and often easy-to- 
use set of tools. The GIS user should bear in 
‘mind that these tools can be more efficient at 


а) Any cell rule 


producing errors as well as more efficient at 
providing correct results. Until sufficient 
experience is obtained with a suite of algo- 
rithms, in this case vector-o-raster conver- 
sion, small, controlled tests should be 
performed to verify the accuracy of a given 
method or set of constraining parameters. 
Up to this point we have covered vector- 
to-raster data conversion. Data may also be 
converted in the opposite direction, from 
raster to vector data. Point, line, or area fea- 
tures represented by raster cells are con- 
verted to corresponding vector data 
coordinates and structures. Point features are 
represented as single raster cells. Each vec- 
tor point feature is usually assigned the coor- 
dinate ofthe corresponding cell center. 
Linear features represented in a raster 
environment may be converted to vector 
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lines. Conversion to vector lines typically 
involves identifying the continuous con- 
nected set of grid cells that form the line. 
Cell centers are typically taken as the loca- 
tions of vertices along the line (Figure 2- 
370). Lines may then be “smoothed” using a 
‘mathematical algorithm to remove the “stair- 
step” effect. 


Raster Geometry and Resam- 
pling 
Data often must be. when 


converting between raster and vector data, or. 
changing the сей size of a raster data set 
(Figure 2-38), Resampling involves reas- 
fea be Ui r m 
coordinates or geometry. Resampling is 
required when changing cell sizes because. 
the new cell centers will not align exactly 
with old cell centers. Changing coordinate 
systems may change the direction ofthe Х 
and Y axes, and GIS systems often require 
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that the cell edges align with the coordinate 
system axes. Hence, the new cells often do 
‘not correspond to the same locations or 
‘extents as the old cells. 


Common resampling approaches 
include the nearest neighbor (taking the out- 
ри layer value from the nearest input layer 
cell center). bilinear interpolation (distance- 
based averaging of the four nearest cells), 
and cubic convolution (a weighted average 
of the sixteen nearest cells, Figure 238). 

‘An example of a bilinear interpolation is 
shown in Figure 29. This algorithm uses a 
distance-weighted average of the four near- 
est cells in the input to calculate the value for 
the output. The new output location is 
seated by the black post. Initially, the height, 
or Zag value, of the output Location is 
unknown. Z, is calculated based on the dis- 
tances between the output locations and the. 
input locations. The distance in the X direc- 
tion is denoted in Figure 2-39 by d, and the. 
distance in the y direction by dj. The values 


SO eis com eie mp сай aes 


Output 
raster 


What is the value of Z n? 


+2, AEAN 


DE 


-zara 
= 


Za Te Ld 


2.39: The 


К т tough Ze 


in the input are shown as gray posts and are 
labeled as 2; through Ze. Intermediate 
heights Z, and Z, are shown. These repre- 
sent the average of the input values when. 
taken in pairs in the x direction. These pairs 
are 2; and Za, to yield Z, and 2; and Zi, to 
yield Z,. Z, and 2, are йеп averaged to cal- 
‘ulate Z, using the distance dz between tbe. 
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input and output locations to weight values. 
at each input location. The cubic convolu- 
tion resampling calculation is similar, except 
that more cells are used, and the weighting is 
not an average based on linear distance. 
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Other Data Models 
Triangulated Irregular Networks 


A triangulated irregular nehvork (TIN) 
is a data model commonly used to represent 
terrain heights. Typically, the x у, and z loca- 
tions for measured points are entered into the 
TIN data model. These points are connected 
such that the smallest triangle possible spans 
any three adjacent points. The TIN forms a 
connected network of triangles (Figure 2- 
40), Delaunay triangles are created such that 
the line crossings are avoided. Triangulation 
identifies the convergent circle for a set of 
three points, defined as a circle passing 
through all three points. A triangle is drawn 
only if the corresponding convergent circle 
contains no other sampling point. Each tri- 
angle defines a facet of uniform slope and 
aspect over the triangle. 

The TIN model typically uses some 
form of indexing to connect neighboring 
points. Each edge of a triangle connects to 
two points, which in tum each connect to 
other edges. These connections continue. 
recursively until the entire network is 
spanned. 
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‘While the TIN model may be more com- 
plex than simple raster models. it is offen 
more efficient for storing terrain data in 
areas with variable relie. Relatively few 
points are required to represent large, flat, or 
smoothly continuous areas. Many more 
points are desirable for rigged terrain. Sur- 
Veyors often collect more samples per unit 
мез where the termin is highly variable. A 
TIN easily accommodates these differences 
in sampling density. resulting in more, 
smaller triangles in the densely sampled 
мез 


Object Data Models 


The object data model is an altemative 
for structuring spatial data. A main goal is to 
raise the level of abstraction so that the data. 
‘objects may be conceptualized and 
addressed in a more natural way. Objects are 
often geographic features, with spatial and 
attribute data associated with the object, e.g., 
а city object may include information on the 
city boundary, streets, building locations, 
waterways, or other features in organized 
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data structures. Vector topology could be 
included, incorporated within the single 
‘object, Object model approaches have been 
adopted by at least one major vendor of GIS 
software and are applied in a number of 
fields. 

Object models for spatial data often fol- 
low a logical model, a user's view of the real 
objects we portray with a GIS (Figure 2-41). 
This model includes all the “things” of inter- 
est, and the relationships among them. 
‘Things, or objects, might include power 
poles, transformers, powerlines, meters, and 
customer buildings in a city, and relation- 
ships among them would include a trans- 
former on a pole, lines between poles, and 
meters at points along the lines. The 
‘model is often represented as a box 
diagram. 

Most object models define the 
ties of each object, and the relationships. 
among objects. Pipe objects may have a 
diameter, material type, and be connected to 
valves and tanks. The pipes may be repre- 
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sented by lines and the valves by points, but 
these vector elements are enhanced inthe 
object model because the specific pipe and 
pipe properties may be linked to the specific 
valve attached to a given location. Object 
models can have inheritance, automatically 
transfering properties within classes of 
objects. We may create a generic valve 
object, with a maximum pressure rating. 
cost, and material type. We may create 
valve subclasses within this clas, e.g., emer- 
gency cut-off valves, primary control valves, 
or shunt valves. These subclasses will inherit 
all the property variables from a generic 
valve in that each bas а cost, maximum pres- 
Sure, and material, but each subclass may 
also have additional unique properties. 

Figure 2-42 shows an example of an 
object data model for hydrologic basins and 
related stream features. The op frame shows. 
features; in this example basins, sub-basins, 
а stream network, and features on the stream 
вото such as sampling sations. Ты bot 
tom panel shows the feature types, attributes, 
and properties in Ше object model, Note that 
there are both object properties and topolo, 
cal relationships represented, and that mul 
ple feature types may be represented in the 
object model. The object data model has 
both advantages and disadvantages when 
compared to traditional data models. Some 
‘geographic entities may be naturally and 
easily identified as discrete units for particu- 
lar problems, and so may be naturally 
amenable to an object-oriented approach. 
‘Some proponents claim object models are 
‘more easly implemented across a wider 
range of database software, particularly for 
complex models. However, object data mod- 
els are less useful for representing continu- 
ously varying features, such as elevation. In 
addition, for many problems, object defini- 
ion and indexing may be quite complex. 
Software developers have had difficulty 
developing generic tools that may quickly 
implement object models, so there is an 
added level of specialized training required. 
Finally. we note that there is no widely 
accepted, formal definition of what consti- 
tutes an object data model. 
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Three-Dimensional Data Models 


GIS in built environments are increas- 
ingly integrating three-dimensional (3-D) 
information such as building heights, roof 
shapes, and other height-related characteris- 
ties (Figure 2-43), This is in part to support 
analysis, and in part to generate visualiza- 
tions from at or near ground-level, for exam- 
pile, building appearance from a nearby road. 
Improved data capture allows for the rapid 
development of 3-D data that must be inte- 
grated into an appropriate data model 


Several 3-D data models have been pro- 
posed, with "vector-ike" models more com- 
monly applied than "raster-like" models. 
‘The latter employ the concept of voxels. or 
‘volume elements, essentially cubes of a 
fixed dimension. Flat or baseline areas have. 
zero voxels stacked over the surface, with а 
requisite number of voxels “stacked” at each 
raster cell to represent height. While easy to 
access and simple for all the reasons a raster. 
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setis, they also suffer from the same draw- 
backs, particularly the trade-off between pre- 
cision, or voxel size, and data volumes. 

Many alternative “vector-ike” 3-D 
models have been proposed, many defining 
a body element, in addition tothe standard 
point, line, and polygon elements. Points are 
used to create lines, lines for polygons, and 
polygons to create bodies. Much like 2-D 
vectors, -D vectors add indexing schemes, 
with the added complication of a z coordi- 
nate to any element above the base plane of 
the data layer. 

‘One 3-D model, developed by Bentley 
Systems, is called а realy mesh (Figure 2- 
44) It combines a three-dimensional trian- 
ulated irregular network with an image- 
based texture surface to create realistic rep- 
resenttions of 3-D features, Complex 3-D 
surfaces can be efficiently represented, 
demanding less storage space. while proper- 
ties like material or color can be associated 
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with each of the triangular faces. A com- 
pressed raster “texture surface" may be 
‘wrapped around the 3-D TIN, yielding sub- 
stantial detail. The images may be com- 
pressed, using methods described later in 
this chapter, to reduce data volumes. 

While vector 3-D models are becoming 
common, no ene model form or standard has 
been widely adopted. Three-dimensional 
GIS products have become quite mature. for 
example, the 3-D spatial analyst and 3-D 
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tured, stable, productive tools, supporting 
start lo finish workflows, from 2-D to 3-D 
data conversion, 3-D spatial data ingestion, 
processing and organization, and output 10 
project, video, and interactive Web services. 
Three-dimensional GIS processing tools are 
available from other vendors, and as a plug- 
in for QGIS. 
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Multiple Models 


Digital data may often be represented 
using any one of several data models, The 
analyst must choose which representation to 
use. Digital elevation data are perhaps the 
best example of the use of multiple data 
‘models to represent the same theme (Figure 
2-45). Digital representations of terrain 
height have a long history and widespread 
‘use in GIS. Elevation data and derived sur- 
faces such as slope and aspect are important 
in hydrology, transportation, ecology. urban. 
and regional planning, utility routing. and a 
‘number of other activities that are analyzed 
using GIS. Because of this widespread 
importance, digital elevation data are com- 
‘monly represented in a number of data mod- 
els, 
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Raster grids, TINs, and vector contours 
are the most common data structures used to 
organize and store digital elevation data. 
Raster and TIN data are often called digital 
elevation models (DEMS) or digital terrain 
‘models (DTMs), and are commonly used in 
terrain analysis. Contour lines are most often 
used asa form of input, or as a familiar form 
of output. Historically. hypsography (terrain 
height) was depicted on maps as contour. 
lines (Figure 2-45, bottom left). Contours 
represent lines of equal elevation, typically 
spaced at fixed elevation intervals across the 
mapped areas. Because many important 
analyses are more difficult using contour 
lines, most digital elevation data are stored 
using raster or TIN models. 
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Data and File Structures 


Binary and ASCII Numbers 


No matter which spatial data model is 
used, the concepts must be translated into a 
set of numbers stored on a computer. АЙ 
information stored on a computer in digital 
format may be represented as a series of O's 
and 1's. These data are said to be stored in a 
binary format, because each digit may con- 
tain one of two values, 0 or 1. Binary num- 
bers are in a base of 2, зо each successive 
column of a number represents a power of 
two. 

We use a similar column convention in 
our familiar ten-based (decimal) numbering. 
system. As an example, consider the number 
47, which we represent using two columns. 
The seven in the first column indicates there 
are seven units of one. The four in the tens 
column indicates there are four units often. 
Each higher column represents a higher 
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power of ten. The first column represents 
‘one (107-1), the next column represents tens 
(101-10), the next column represents hun- 
dreds (102-100), and upward for successive 
powers of ten. We add up the values repre- 
sented in the columns to decipher the num- 
ber. 

Binary numbers are also formed by rep- 
resenting values in columns. In a binary зуз- 
tem each column is a successively 
higher power of two (Figure 2-46) The first 
(rightmost) column represents 1 (2° = 1), the. 
second column (from right) represents twos 
(2! = 2), the third (from right) represents 
fours (2° = 4), then eight (2° = 8), sixteen (2* 
= 16), and upward for successive powers of 
two. Thus, the binary number 1001 rep- 
resents the decimal number 9: а one from the 
rightmost column, and eight from the fourth 
column (Figure 2-46, left. 
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Each digit or column in a binary number 
is called abi, and eight columns, or bits, are 
called a byte. A byte is a common unit for 
defining data types and numbers, for exam- 
ple, a data file may be referred to as contain- 
ing 4-byte integer numbers. This means each 
number is represented by 4 bytes of binary 
data (or x4 = 32 bit). 


Several bytes are required when repre- 
senting larger numbers. For example, one 
byte may be used to represent 256 different 
values. When a byte is used for nonnegative 
integer numbers, then only values from 010 
255 may be recorded, This will work when 
all values are below 255, but consider an ele- 
vation data layer with values greater than 
255. If the data are not rescaled, then more 
than one byte of storage is required for each 
value. Two bytes will store up to 65,536 dif- 
ferent numbers. Terrestrial elevations mea- 
sued in feet or meters are ай below this 
value, so two bytes of data are often used to 
store elevation data, Real numbers such as 
12.19 or 865.3 typically require more bytes, 
and are effectively split, hat is, two bytes for. 
the whole part of the real number, and four 
bytes forthe fractional portion. 


Binary numbers are often used to repre- 
зем codes. Spatial and attribute data may 
then be represented as text or as standard 
codes. This is particularly common when. 
raster of vector data are converted for 

or import among different GIS software sys- 
tems. For example, ArcGIS, a widely used 
GIS, produces several export formats that 
are in text or binary formats. Idrisi, another 
popular GIS, supports binary and alphanu- 
meric raster formats. 

One of the most common number cod- 
ing schemes uses ASCII designators. ASCII 
stands for the American Standard Code for 
Information Interchange. ASCI is a stan- 
dardized, widespread data format that uses 
seven bits, or the numbers 0 through 126, to 
represent text and other characters. Ап 
extended ASCII. or ANSI (American 
National Standards Institute) scheme, uses. 
these same codes, plus an extra binary bit to 
represent numbers between 127 and 255. 
These codes are then used in many pro- 


grams, including GIS, particularly for data 
export or exchange. 

ASCII codes allow us to easily and uni- 
formly represent alphanumeric characters 
such as letters, punctuation, other characters 
and numbers. ASCII converts binary num- 
bers to alphanumeric characters through an 
index. Each alphanumeric character corre- 
sponds to a specific number between 0 and 
25S, which allows any sequence of charac- 
ters tobe represented by a number, One byte 
is required to represent each character in 
extended ASCII coding, so ASCII data sets 
are typically much larger than binary data 
sets. Geographic data ш а GIS may use a 
combination of binary and ASCII data stored 
in files. Binary data are typically used for 
coordinate information, and ASCII or other 
codes may be used for attribute data. 


Pointers and Indexes 


Data {йез may be linked by file pointers, 
indexes, ot other structures. A pointer is an 
address or index that connects one file loca- 
tion to another. Pointers are a common way 
to organize information within and across 
multiple files. Figure 2-47 depicts an exam- 
pile of the use of pointers to organize spatial 
data, In this figure, the polygon is composed 
ofa set of lines. Pointers are used to link the 
set of lines that form each polygon. There is 
а pointer from each line to the next line, 
forming a chain that defines the polygon 
boundary. 

Pointers help by organizing data in such 
ıa way as to improve access speed. Unorga- 
nized data would require time-consuming. 
searches each time a polygon boundary was. 
to be identified. Pointers also allow efficient 
use of storage space. In our example, each 
line segment is stored only once. Several 
polygons may point to the line segment, as it 
is typically much more space efficient to add 
pointers than to duplicate the line segment. 

Shapefiles are a common vector spatial 
data format that uses an index to link files. 
Shapefiles were originally developed by 
ESRI as a way to store point, line, and poly- 
gon features although they have since been 
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Figure 247: Pointers ме used to organize vector data Pointers reduce redundant storage and 


йене speed of accen 


adopted as a common format for data inter- 
change and analysis. Shapefiles ace 
pore by most ie GS soia ar 
process vector data. 

Shapefles represent layers with a clus- 
ter of files. Each file has the same base name 
but a different filename extension, indicated 
by a suffix, for example, the*shp" in the 
filename "boundaryshp." A transportation 
data layer stored in shapefile format might 
have the base name of roads, with different 
suffixes for different fles 

roads sp 

roads hx 

roads ФЕ 

‘toads. prj 

ate 

The first three files above are all 
required to represent a vector data layer 
using shapefiles, These files are connected 
using indices, numbers that identify connec- 
tions and groupings for various components. 
The shp files contain the coordinates that 
represent each road. organized by line seg- 
ments. There is general information for each 
segment, and then а list of coordinates and 
other data for the segment. This is followed 
by general information for the next segment, 


and another list. The roads.shs file contains 
indices that point to the segment records in 
the shp files, based on these identifiers. This 
speeds access, avoiding a search in the зір 
file cach time it links segments in a road, 
The roads dbf file also uses an index to point 
to the combined roads in the -shp and hx 
files. A group of segments may be used to 
form a line, and associated with a set of atri- 
butes stored ina bf file. 


‘Because pointers and indices are key 
elements in organizing the spatial data, alter- 
ing them directly will usually cause prob- 
lems. Typically. these indices are created by 
the software during processing, and updated 
as needed when data are added, modified, ог 
analyzed. Pointers may be visible, for exam- 
ple, the OID columns in the dbf tables used 
with shapefiles, but manually changing the 
values will often ruin the data layer. You 
should know the identity and use of pointers 
in your data sets so that you don't change 
them inadvertently. 

Pointers, indexing, and multifile layers 
are not limited to vector data. Many raster 
formats store a majority of the cell data in 
‘one file, and additional, linked information 
in an associated file. You must be careful 
when transferring a data layer to include all 
the associated files. For example, copying 
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the roads shp and roads dbf files to a new 
location does not copy a usable data layer. 
The software expects a.shx file; an incom- 
plete file set is often useless. 


‘Two more complex structures are com- 
‘mon, ESRI Geodatabases and the Open Geo- 
spatial Consortium (OGC) GeoPackage. 
‘These are proprietary and open standards, 
respectively. for storing both vector and ras- 
ter data and topologies in an integrated fash- 


ion, 


Data Compression 


We often compress spatial data files 
because they are large. Data compression. 
reduces file size while maintaining the infor- 
‘mation contained in the file. Compression. 
algorithms may be lossless, in that all infor- 
‘mation is maintained during compression, or 
Jossy, in that some information is lost. A 
lossless compression algorithm will produce 
an exact copy when a file is compressed and 
decompressed. A lossy algorithm will alter 
the data on a compression -decompression 
round trip. Lossy algorithms are most often 
used with image data, where substantial deg- 
radation still leaves a useful image, and аге 
‘uncommonly applied to thematic spatial 
data, where any data degradation is typically 
not tolerated. 

Data compression is most often applied 
to discrete raster data, for example, when 
representing poly gon or area information in 
a raster GIS. There are redundant data ele- 


ments in raster representations of large 
homogenous areas. Each raster cell within a 
homogenous area will have the same code as 
most or all of the adjacent cells. Data com- 
pression algorithms remove much of this 
redundancy. 

 Run-length coding is a common data 
‘compression method. This compression 
technique is based on recording sequential 
runs of raster cell values. Each run is 
recorded as the value found inthe set of 
adjacent cells and the run length, or number 
of cells with tbe same value. Seven sequen- 
tial cells of type A might be listed as A7 
instead of ААААААА. Thus, seven cells 
would be represented by two characters, 
Consider the data recorded in Figure 2-48, 
‘where each line of raster cells is represented 
by aset of run-length codes. In general, run- 
length coding reduces data volume, as 
shown for the top three rows in Figure 2-48. 
Note that in some instances, run-length cod- 
ing increases the data volume, most often 
‘when there are no long runs. This occurs in 
the last line of Figure 2-48, where frequent 
changes in adjacent cell values result in 
many short runs. However, for most the- 
matic data sets containing area information, 
Tun-length coding substantially reduces the 
size of raster data зев. 

There is also some data access cost in 
‘un-length coding. Standard raster data 
access involves simply counting the number 
of cells across a row to locate a given cell. 
To locate a cell in run-length coding we must 


Raster Run-length codes 
[e[s[e]e]e]e]e[] 29.56.17 
6|6|6|6|6|6|6|6| 86 
9|9|6|6|6|6|7]|7| 29.46.27 
9|8|9|6|6|7|7|5 1:9, 1:8, 1:9, 2:6, 2:7, 1:5 
SE Ti et mb eg pc ms ӨГ са бе па, and eR ТЕЕ СШ 
“tlc Тын. бе 29 ited at the ишт ofthe Et ше tnde а ran of eng vo for tee cell value 9. 


Sum along the run-lengih codes to identify a 
cell position. This is typically a minor addi- 
tional cost, but in some applications the 
trade-off between speed and data volume 
may be objectionable. 


Quad mee representations are another 
raster compression method. Quad trees are 
similar to run-length codings in that they are 
most often used to compress raster data sets 
when representing area features. Quad trees 
may be thought of as a raster data structure 
‘with a variable spatial resolution. Raster cell 
sizes ме combined and adjusted within the 
data layer to fit into each specific area fea- 
ture (Figure 2-49), Large raster cells that fit 
entirely into one uniform area are assi 

the value corresponding to that area; 
example, the three largest cells in Figure 2- 
49 are all assigned the value o. Successively 
smaller cells are then fit, halving the cell 
dimension at each iteration, again fitting the 
largest cell that will fit in each uniform area. 
This is illustrated in the top-left comer of 
Figure 2-49, Successively smaller cells are 
defined by splitting "mixed cells" into four 
quadrants, and assigning the values o orb to 
uniform areas, This is repeated down to the 
smallest сей size that is needed to represent 
uniform areas at the required detail. 


The varying cell size in a quad tree rep- 
resentation requires more sophisticated 


Figure 2-49: Quad wee compression. 
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indexing than simple raster data sets. Point- 
‘ers are used to link data elements in a tree- 
like structure, hence the name quad trees. 
There are many ways to structure the data. 
pointers, from large to small, or by dividing 
‘quandrants, and these methods are beyond 
the scope of an introductory text. Further 
information on the structure of quad trees 
may be found in the references at the end of 
this chapter, 

‘There are many otber data compression 
methods that are commonly applied. LZW is 
а lossless compression method commonly 
applied to image and raster data sets, partic- 
‘larly GIF images and TIFF formats. JPEG 
and wavelet compression algorithms are 
‘often applied to reduce the size of spatial 
data, particularly image or other data, 
although as implemented these are lossy 
algorithms. Generic bit- and byte-level com- 
pression methods may be applied to any files 
for compression or communications. There 
is usually some cost in time to the compres- 
sion and decompression, 


Raster Pyramids 


‘We sometimes intentionally increase the 
size of our raster data sets without increasing 
the resolution in a process known as pyra- 
miding. We create pyramids to increase dis- 
play speeds when viewed at small scales 
(“zoomed our"). Long redraw times often. 
hinder use of large data ses, particularly 
‘when panning frequently. When displayed at 
very small scales, the cell size of a data set 
may be smaller than the resolution of the 
‘computer screen. А raster data set 1,000,000 

across has 1,000 times the data that 
сап be displayed on a monitor with 1,000 
pixel horizontal resolution. However, dis- 
play software must wade through all 
1,000,000 data elements in a row to pick the 
1 cell in 1.000 to display. While clever soft- 
ware can help, there are limits to how much 
we can speed up the redraws. 

Pyramiding in effect saves subsampled 
copies of the cells at various resolutions. In 
‘our example above, pyramids may do the 
‘equivalent of saving every two, every four, 


72 GIS Fundamentals 


every 10, every 30, and every 100 cells, all 
‘within the same raster data set. The software 
then compares the display scale to he 
dimensions of the data set, and chooses the 
most appropriate cell resolution to display. 
Redraws are much faster, and transparent to 
the user. 

Note that we say pyramids “in effect" 
save copies of cells at various resolutions. 
This is the simplest method, but often not the 
most efficient for space or speed of access. 
Sophisticated indexing may be used to point 
to the cells at the appropriate resolutions. 

Note that pyramiding comes at a cost, 
both in the size and complexity of the raster 
data set. Indexing schemes complicate the 
simple raster data structure, and the software 
must be able to navigate the indexing 
scheme. Already large raster data sets may 
be inflated from a few percent to several 
times, although in practice it is typically less 
than a doubling of size. 


Common File Formats. 


A few file formats are commonly used 
to store and transfer spatial data. Some of 
these file structures arose from distribution 
formats adopted by governmental agencies, 
others were specified by software vendors, 
and some have been devised by standards- 
‘making bodies. Some knowledge of the 
types and properties of these file formats is 
helpful to the GIS practitioner. 

Common geographic data formats may 
be placed into three large classes: raster, vec- 
tor, and attribute. Raster formats may be fur- 
"her split into single-band and multi-band. 
file types. Multi-band raster data sets are 
‘most often used to store and distribute image 
data, while single-band raster data sets are 
used to store both single-band images and 
nonimage spatial data. Table 2-3 summarizes. 
some of the most common spatial data for- 
mats. 

Most GIS softwares provide some utility 


for data import and export from standard for- 
mats 


The Geospatial Data Abstraction 
Library (GDAL) provides a utility to trans- 
late among many common vector and raster 
file formats. This free utility is flexible and 
often can be used to extend the reach of 
commercial packages, by first using GDAL. 
to convert files from unsupported to sup- 
ported types, and then importing these into 
the target software. 


‘Summary 


In this chapter we have described the. 
main ways of conceptualizing spatial enti 
ties, and of representing these entities as spa- 
tial features in a computer. We commonly 
employ two conceptualizations, also called 
spatial data models: a raster data model and. 
a vector data model. Both models use a com- 
bination of coordinates. defined in a Carte- 
sian or spherical system, and attributes, to 
represent our spatial features. Features are 
usually segregated by thematic type in lay- 
«з. 


‘Vector data models describe the world 
asa set of point, line, and area features. 
Attributes may be associated with each fea- 
ture. A vector data model splits that world 
into discrete features, and often supports 
topological relationships. Vector models are 
most often used to represent features that are 
considered discrete, and are compatible with 
vector maps, a common output form. 

Raster data models are based on grid 
cells and represent the world as a "checker- 
board,” with uniform values within each 
сей. А raster data model is a natural choice 
for representing features that vary continu- 
‘ously across space, such as temperature ог 
precipitation. Data may be converted 
between raster and vector data models. 

We use data structures and computer 
codes to represent our conceptualizations in 
‘more abstract, but computer-compatible 
forms, These structures may be optimized to 
reduce storage space and increase access 
speed, or to enhance processing based on the. 
mature of our spatial data. 
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Table 2-3: Common formats for spatial data. 


‘Typeandsource | Extension Characteristics. 
(ReRaster, V=Vector, A-Atiibute, image) 
Comma Separated | csv ‘Common ASCI tex format used to белі алыл 
Vite and ctn vector fomaton (A V). 
OXF лоры — | oat ‘Drawing exchange tle, an ASCII or binary Ho for 
‘exchanging рва dota (V) 
оно AutoDesk | ow ‘ave binary He used by AutoDesk to store geo- 
узи. dta and drawings п AutoCAD (V) 
беодизһазе ESRI | gib mds ESRI contaner for many data types (R V, A 1) 
GeoisOlLopen | json, geqson | Open standard tor representng and displaying simple 
standard geographic features (V. A) 
‘GeoPackege, open | йд ‘Open standard tor representing vector and raster data, 
standard ‘compat win Оше (R V.A) 
боо, open stan. | ТЕ, NEF An entension ot grotterencing Aus Adobe public 
[^] ‘oman ТЕ format (RY 
GPX open standard | gpx ‘A spacication based on XML fr basic GNSS dia (V) 
‘mage, ERDAS | mg ‘Mutiband capable mage format (R) 
Interchange, ESRI | 600 ASCII taxt he for vector and derttyng abut data 
[2 
Keyhole Markup Lan- | KML XML extersion tor espayng and annotating features 
ойдо. Googie and mages VLA) 
VAS ASPRS. aS laser port doud data зво У) 
Sapete ESRI | she, shx, dbf, pi, | Throw or mor binary нез that include the vector coor- 
and omens бов, эзле. and ober formation (V) 
TIGER. US. Consus | tray, sz | Set of tis by US. census ares, xis state codo. 
ууу an area code, 2 numbers or varous Ме types (V, 
b 
MEMO, Mapito | mí mà Mag interchange Fte, vector and raster dala vansport 
tom марте (V. R). 
асое оос E dua formats tor scenic data 
rays RAM. 
LAPS, НАЗА | varous m adrecory | image data from venus Landsat sates, 1n a speci- 
"ted directory зум (I, R) 
5015.05 Gove. | none рапа Data Transfer Standard, species tho spata 
ment ес». atributos reference system (R. V, A) 


тз 
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Study Questions 


2.1 - How is an entity different from a cartographic object? 


2.2 - Describe the successive levels of abstraction when representing real-world spa- 
tial phenomena on a computer. Why are there multiple levels, instead of just one level 
ina spatial data representation 


2.3 - Define a data model and describe three primary differences between the two 
‘most commonly used data models, 
24 - Characterize the following lists as nominal, ordinal, or interval/ratio: 

a) 1.1, 9.7, -23.2, 04,667 

b) green, red, blue, yellow, sepia 

c) white, light grey, dark grey, black 

d) extra small small, medium, large, extra large. 

¢) forest, woodland, grassland, bare soil 

01.2,3,4,5.6,7 


2.5 - Characterize the following lists аз nominal, ordinal, or interval/ratio: 
a) Spurs, Citizens, Reds, Homets, Baggies, Toffees, Potters 
b) pinch, handful bucket, bushel, truckload 
962.78 11,05, 19.3 
Ф) gram, kilogram, metric ton. 
€) Mexico, Canada, Argentina, Guyana, Martinique 
1) small smaller, smallest 


2.6 - Indicate which of the following are allowable geographic coordinates: 
a)N45° 45'45” Б) longitude -127.347959 с) 5969 12°33" 


d)E6 1960" e) W-12° 23°55" 9N 56.9999° 


2.7 - Indicate which of the following are allowable geographic coordinates: 
a)N145° 45'12” b) latitude -62.34795° €) S110° 52°43" 


d)S49?15'60" e) N 89° 59° 59° f) S -46.6000° 
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2.8 - Convert the following degree measures to radians: 
а)4728379 5)ISS7M* c)-111.2045° 


Convert the following radian measures to degrees: 
4) 0.0042. €)-126 0225037 


2.9 - Convert the following degree measures to radians: 
DI b)-21.533° — cross 


‘Convert the following radian measures to degrees: 
LIES 0.014 037 


2.10- Complete the following coordinate conversion table, converting the listed 
points from degrees-minutes-seconds (DMS) to decimal degrees (DD), ог from 
DD to DMS. See Figure 2-10 for the conversion formula, 


Point DMS [Decimal Degrees 

1 364512" 3675333 

2 114°582° 

3 85197 

4 1400917 

5 275.500001 

6 0.99528 

۴ 18319227 


2.11 - Complete the following coordinate 
points from degrees-minut 
to DMS. See Figure 2-10 for the conversion. 
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the listed. 


conversion table, converting. 
tes-seconds (DMS) to decimal degrees (DD), or from DD. 
formula. 


Point DMS Decimal Degrees 
1 974510 9775278 
2 12202" 
з 15012” 
4 32219861 
5 15265583 
6 575 
7 23°1250° 
+ CNN + amen sare a distance, in meters, assuming an Earth 
Arc Angle | Arc 
Point | (se) | Radions | Distance (m) 
1 10  (|00145203| 1113188 
2  |001666667| 
з  |o00027778 
4 | 321611111 | 
5 o5 | 
6 00125 | 
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2.13 - Calculate the radian and surface arc distance, in meters, assuming an Earth 
radius of 6,378.1 kilometers. 


Arc Angle Arc 
Point | (гес degs) | Radians | Distance (m) 


1 |007083333|000123628 78851 
2  |ooseéium 
3 [0225138889 
4 30 
5 
6 


100 
075 


2.14 - Assume a spherical Earth with a radius of 6378.0 km. Calculate the great circle 
distances from St. Paul, Minnesota, latitude 44.9537, longitude -93.09°, to the fol- 
lowing points: 

ıa) Chicago, latitude 41.5781", longinide -87.6298° 

b) Reykjavik, latitude 64.1265, longitude -21.8174° 

c) Buenos Aires, latitude -34,6037°, longitude -58.3816° 


2.15 - Assume a spherical Earth with a radius of 6378.0 km. Calculate the. 
cle distances from St. Paul, Minnesota, latitude 44.9537°, longitude -93. 
following points: 

a) New York, latitude 40.7128°, longitude -74.0059° 

b) Paris, latitude 48.8566°, longitude 2.3522° 

с) Tokyo, latitude 35,6895°, longitude 139.6917° 


2.16 - What is vector topology. and why is it important? What is planar topology. and 
‘when might nonplanar be more useful than planar topology? 
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Er Tienaa themes ots той оббо Конн vache ok кө Ыла 
the building outlines (redidarker) and property parcels (green/light) polygon la 
Nees а eat ی ا‎ EE fs е аа Шш 


€) A building must not span a parcel boundary. 
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злв- ا ا‎ Чыке SS aa کا ت پا‎ ров 
E pneter i ay > 


a) Buildings must not overlap. 

b) Parcels must not overlap. 

¢) Parcels must not have gaps. 

4) Buildings must be entirely within the parcel layer. 
А building must not span a parcel boundary. 
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2.19 - Draw multi-part and single-part tables for the United Kingdom, recording the 
labeled countries, 
+ 
F 
p c 


ГЕ 
[und 


2:20 - Draw multi-part and single-part tables for Italy, recording the Italian mainland 
and labeled major hands ни 


ped Ne 


М 


к 


2.21 - What are the respective advantages and disadvantages of vector data models 
уз. raster data models? 


2.22 - Under what conditions are mixed cells a problem in raster data models? In 
‘what ways may the problem of mixed cells be addressed? 
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223- List the labels (А.В. C, etc.) of raster cells that would be assigned as water 
(dark, blue) and not land (light, green) under a majority coverage rule. 


2.24 - List the labels (A, В, C. etc.) of raster cells that would be assigned as water 
(dark, blue) and not land (light, green) under a majority coverage rule. 
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2.25 The following figure shows change in raster resolution, combining four small 
cells on the left to create an output for each corresponding larger cell on the right. Fill 
in the two rasters on the right, for the interval ratio data (top). and the nominal data 
(bottom). Assume null values are not ignored, and a majority rule for nominal data. 


|] 


«|||» 
pM 


5 
о[е[о[о 
о[е[е[е 

CEGE 


Is 


2.26 - The following figure shows change in raster resolution, combining four small 
cells on the left to create an output for each corresponding larger cell on the right. Fill 
in the two rasters on the right, for the interval ratio data (top), and the nominal data 
(bottom). Assume null values are not ignored, and a majority rule for nominal data. 


BANE 
feu 2| 2| 2 
|а| 3 
3l 3| 4 
а[а[ь[а 
a|b|c|b & 
nal c| cle 
clefele 
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2.27 - Complete the one-to-one (top) and many-to-one (bottom) raster tables in the 
figure below: 


attribute table 
(rows rst start 
upper-left corner) 

ПИЕ 

HEBE 

JEDE + 

DEBE 
CEEE] 

Hnn 

a[z[2[2 

ПЕЕ 

NEBE 


2.28 - Complete the one-to-one (top) and many-to-one (bottom) raster tables in the 
figure below: 


attribute table 
(rows first stort 


5[7|2 
9 


8 
7 
8 
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2.29- What is a triangulated irregular network? 


2.30 - What are the main concepts behind object data models, and how do they differ 
from other data models? 


2.31 - Why do we use binary numbers in computers? 
2.32- Express the following base 10 numbers in binary notation: 


аі b)23 ©) 256 94 
әп 910 93 БЕ] 
2.33- Express the following base 10 numbers in binary notation: 
22 bs 99 417 
D 0128 92 h)i9 
2.34 - Express the following binary numbers in base 10 notation: 
a) 0101 boo! сип a) 00101101 
«по pion g) 10000001 b) 11111111 
2.35 - Express the following binary numbers in base 10 notation: 
a) nio b) 1001 — ooo 4) 10000101 
e) 1000 191010 g) 10010001 h) 11110000 


2.36- Why do we need to compress data? Which are most commonly compressed, 
Taster data or vector data? Why? 


2.37 - What are pointers when used in the context of spatial data, and how are they 
helpful in organizing spatial data? 
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2.38 - Write the ron-length coding for each of the rows in this raster: 


ь ьа а [а 


а 


b|b|bj|b|b|bj|bj|bj|b 


a 


2.39 - Write the run-length coding for each of the rows in this raster: 
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3 Geodesy, Datums, Map Projec- 
tions, and Coordinate Systems 


Introduction 


GIS are different from other informa- 
tion systems because they include coordi- 
nates, Proper use of GIS requires. 
‘understanding how coordinate systems are 
established, how coordinates are measured 
‘on the Earth's curving surface, and how 
these coordinates are converted for use in 
flat maps. This chapter introduces geodesy, 
the science of measuring Earth shape and 
locations, and map projections, the trans- 
formation of coordinate locations from the. 
Earth's curved surface onto flat maps. 


As described in Chapter 1, GIS practi- 
tioners commonly use two kinds of coordi- 
nate systems. The first is a spherical 
‘coordinate system with latitudes and longi- 
tudes on the surface of an ellipsoid-shaped 
globe. Geodesists measure, combine, and 
‘optimally estimate а set of latitude/longi- 
‘ude coordinates using our most accurate 
and modem technologies. Taken together, 
the precise coordinates at these high-accu- 
тасу points and the mathematically speci- 
fied globe define a geodetic datum. Datums 
are the foundation of subsequent positions, 
‘with all other measurements made relative 
to datums. 


Spatial data spanning large distances 
are often specified in lattude/longitude 
coordinates. True surface distance, flight 
distance and direction, and large areas are 
‘more accurately specified when factoring 
in the Earth's curvature. 


Most data and analysis in GIS use two- 
dimensional, Cartesian coordinate sys- 
tems. These systems are created via map 
‘projections, the systematic mathematical 
rendering of latiudeTongitude coordinates 
from 3-D spherical systems to 2-0 Carte- 
sian systems. Every map projection must. 
distort surface geometry in some way due 
to the Earth's curvature. When we plot lati- 
tude and longitude coordinates on a Carte- 
sian system, “straight” lines will appear 
curved and polygons will be distorted. This 
distortion may be difficult to detect оп 
‘maps that cover а small area, but the distor- 
tions become apparent as the mapped area 
grows. We limit the distortion error by 
carefully selecting our map projections, 
and by limiting the area over which we 
apply any one given map projection. We 
tailor the type, number and extent of our. 
map projections to manage positional error 
according to our spatial error tolerance. 
Historically, the irregular shape of the 
Earth and sparse, unconnected survey net- 
works were large sources of uncertainty in 
defining coordinate systems, coupled with 
limits on our measurement technologies. 
For centuries we assumed a spherical or 
ellipsoidal shape for the Earth, but depar- 
‘ures of the real Earth shape from these ide- 
alized globes led to different estimates of 
coordinate locations. Surveys were difficult 
‘over large areas, depending on optical and 
laborious physical measurements. In one 
section of tbe Great Arc survey across 
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India, over half the surveyors died of illness, 
accidents, or predation. As our measure- 
‘ments and analysis slowly improved. we 
were able to quantify and integrate Earth's 
irregular shape, and our positional estimates 
and accuracies improved through time. 

"When sufficient new survey data accrue, 
we perform network-wide datum adjust- 
‘ments. The datum adjustment reconciles 
errors across the set of measurements, weed- 
ing out blunders and mathematically distrib- 
uting uncertainty across the network. A 
datum adjustment is viewed as our best esti- 
‘mate of measured locations up to that time. 
Each datum adjustment results in a re-calcu- 
lation of coordinates for all existing datum. 
points, as new points influenced tbe esti- 
mates of previous locations. 

Coordinate systems are also compli- 
cated by large tectonic crustal movements, 
‘Continent-sized plates on the Earth's surface. 
drift rotate, and rise ог fall, meaning all 
‘coordinates change time. Prior to 
тсе чиш, 
because we couldn't detect continental drift. 
"Widely-available technologies now 
centimeter-lee accuracies over global dis- 
tances, so most of us can measure and adjust 


Coordinates for a 


for tectonic plate movements. We must con- 
sider the epoch, or time of positional mea- 
surement, to accurately place coordinates, 

There is another factor that adds confü- 
sion. We may choose many different ways to 
project coordinates from our curved Earth 
ellipsoid ошо a Па, Cartesian map surface. 
‘We choose different map projections to man- 
age this spatial distortion. Interpreting a data 
set's coordinate system can be confusing 
‘without knowledge of map projections. 

An example may help. Figure 3-1 shows 
the location of a U.S. survey mark, a pre- 
cisely surveyed and monumented point, 
Coordinates for this point, measured by fed- 
eral and state government surveyors, are 
shown at the top right of the figure, There 
are four different versions of the latitude! 
longitude location for this point. The GIS 
practitioner may well ask, which latitude! 
longitude pair should I use? This chapter 
should allow you decide, 

Note that there are also several versions 
of the Cartesian x and y coordinates for the 
point in Figure 3-1. The differences in the 
Coordinate values are too great to be due 
solely to datum differences or measurement. 


From NGS Geodetic Dato Sheet 


Point Location ‘Latitude (N) Longitude (W) 
(4 57 2321570 09305 5828267 
093 05 $8 28007 
наразае) 09105 58 226 
ЕН 093.05 5827044 
sews men miam мт 
SCANS 104257800 2857705% SFT 
ums &зжшәю 402150128 MT 
From Data Layers 
Мий 873408502 16044122 жт 
MN aor 14305315 4889396 MT 
SEMC MORSE WMP MT 
SPCMNC 2922552206 34308207 FT 
Mc Suisse 18266338 мг 
тшге 3-1: An example of different coordinate values for the same poi, Во a National Geodetic Survey 
(NGS) dts ее sid dan Му We ofen fad walle latine "айне (suey 
data, top), or Х and тайыз forthe same pout (surveyor data. or rm биэ layers, botom). 
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errors, but rather, are due primarily to the. 
map projection used. 

‘We want spatial data in any analysis to 
properly align, so we usually transform all 
data toan identical coordinate system. This 
means all data are expressed in а coordinate 
system using the same datum. of tbe same 
epoch, and if applied, using the same map 
projection, In our example, we might specify 
all data should be in the SPC MNS meters 
coordinate system, based on the 
NADS3(2011) datum, for epoch 2020.0. If 
not, data may not fall in their proper relative 
location. Failure to adjust for datum and 
coordinate system differences is the root of 
many errors in spatial analysis. As a rule, 
you should know the coordinate system used 
for all of your data, and convert ай data to 
the same coordinate system, for the same 
time period, prior to analysis. In some cases 
we may get away with improperly selected 
coordinate systems or ignoring datum or 
epoch differences, e.g., when our positional 
error tolerances are several meters or tens of 
feet and our data are collected over a short 
time period. As positioning technology 
improves, we can make increasingly accu- 
Tate and precise measurements, so in many 
cases the measurement date becomes 

t. This chapter describes how we 
define, measure, and convert among coordi- 
nate systems, 


‘Surface and Ellipsoidal Coordi- 
nates 


Latitude/longitude values are defined on 
a reference ellipsoid, While we make most 
of our measurements at or near the surface of 
the Earth, we mathematically transfer the. 
latitudes and longitudes up or down from the 
measurement location onto the ellipsoid 
(Figure 3-2). The transferring line в at right 
angles, or “normal” to the surface of the 
ellipsoid. All of our horizontal measure- 
ments must be reduced to the ellipsoid sur- 
face. 


We apply the same latitude and longi- 
tude to all points along the normal line 
through the ellipsoid, a ray from the ellip- 


ЖОШ ио med ени. 
soid up through our surface point. АП 
‘objects on this ray will ave the same lati- 
tude and longitude, for example, a point on 
the ground and a plane flying above that 
point. Note that because of ellipsoid eccen- 
tricity, this surface-perpendicular ray does 
not past through the origin, except for in a 
few special locations (poles and equator). 


Modern Coordinate Capture, 
Coordinate Systems, and Datums 


Positions on or near the surface ofthe 
Earth are referenced to an ellipsoid witha 
specified origin and major minor axes, and 
in equivalent Earth-centered. three dimen- 
sional, Cartesian coordinate systems- the X, 
Y. and Z of 3-D systems described in Chap- 
ter 2.A specific, defined version of a 3-D. 
system with а set of defined point locations 
is called a darum. Datums underpin all geo- 
graphic measurements. Datums are 
improved through time (Figure 3-1) and dif- 
fer by polity. so data in different datums may 
not match correctly. 

Most GIS data collection today relies 
directly or indirectly on satellite-based posi- 
tioning systems, such as ће U.S, Global 
Positioning System (GPS. described in detail 
in Chapter 5), or more generically one of. 
several Global Navigation Satellite System. 
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(GNSS). Coordinates measured by GNSS 
are provided with reference to a specified 
datum, GNSS allow the rapid, accurate col- 
lection of locations, measured against a 
datum, any time we have а clear view of the 
sky. Data interpreted from aerial or satellite 
images depend on GNSS because drones, 
planes, and satellites usually determine their 
locations from GNSS. 


A set of Continuous Operating Refer- 
ence Stations (CORS) in the U.S. provides 
coordinates in a datum labeled as 
TTRF(xxxx), where ITRF stands for the 
International Terrestrial Reference Frame, 
and xxx represents a version number. In 
most of North America, much collected data. 
are converted to a different datum, currently 
‘one labeled as NADS3(yyyy) system, where 
yyy is a version designator. e.g. by a name, 
NADSNCORS96). ora year, NADS3(1996). 
Other GNSS (Russian GLONASS, Chinese 
BeiDou, European Galileo) typically report 
an ITRF version, and may be transformed 
to different local datums. How we develop 
datums, why We have different datums, and 
how we convert among datums are 
explained inthe following sections. 


Datum Ellipsoids 


Horizontal geodetic datums are defined 
with a reference ellipsoid and a set of datum 
points specified on that ellipsoid. We refer- 
‘ence all other locations to this ellipsoid and 
datum points. To establish a datum. we need 
to determine the size, shape, and orientation 
‘of the Earth's ellipsoid. orient a latitude/lon- 
gitude system on this ellipsoid, and establish. 
the precise locations of our datum points. 
Errors or uncertainties in the reference ellip- 

id or the datum points set a limit on the 
accuracy of any subsequent measurements, 
зо geodesists (scientists specializing in mea- 
suring the Earth) strive for millimeter-level 
accuracies. 


Humans have long labored to estimate 
the size and shape of the Earth. The Ancient 
Greeks deduced the Earth's shape was a per- 
fect sphere, and were among the first to 
‘quantify the sphere's size. They measured 


locations on the Earth's surface relative to 
the Sun or stars, reasoning these provided a 
stable reference frame. Measuring against 
celestial bodies underlies most geodetic 
observations taken over the past 2,000 years. 

Eratosthenes performed early measure- 
ments of the Earth's circumference. He 
‘observed that on one day each year, the noon 
sun was directly overhead at ancient Syene, 
near present-day Aswan, Egypt, because it 
reached the bottom of a deep well. He also 
observed that 805 km north, at exactly the 
same date and time, a vertical post cast a 
shadow. The shadow post combination 
defined an angle that was about 7°12 or 
about 1/S0th of a circle (Figure 3-3). Eratos- 
thenes deduced that the Earth must be 80S 
multiplied by 50, or about 40,250 kilometers 
in circumference. His estimate is within 4% 
of modern estimates. 


Mathematicians in the 17005 posited 
that centrifugal forces should cause the 
equatorial regions of the Earth to bulge. 
‘They proposed the Earth would be better 
modeled by an ellipsoid, a sphere slightly 
flattened atthe North and South Poles (Fig- 
ure 3-4). As noted in the figure and in Chap- 
ter 2, the ellipsoid has two characteristic 
dimensions: the semi-major axis, the radius 
а in the equatorial direction, and the semi- 
minor axis, the radius b inthe polar direc- 
tion. This difference in polar and equatorial 
radii is also described as a flattening factor, 
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shown in Figure 3-4. There was substantial 
disagreement on the existence and amount of 
flattening. Complex, repeated, and highly 
accurate measurements established that an 
ellipsoid was the best geometric model of 
the Earth's surface (Figure 3-5), because the. 


Ellipsoid :. SY eS 


ea en 


Figure 4: An ellipsoidal model ofthe Earth's 
shape 


An ellipsoid is defined in part 
by two radi, a and b 


We may use the relationship 
dr. to estimate rodit 


Earth's curvature was greater near the Equa- 
tor than at higher latitudes. 

Measurement efforts through the 19th 
‘and 20th centuries led to a set of official 
ellipsoids with various equatorial and polar 
radi. Because early surveys could not span 
the oceans, ellipsoidal parameters were fit 
for each country, continent, or comparably 
large survey area. The Clarke 1866 ellipsoid 
таз commonly used in North America, and 
is flatter than the ellipsoid we use today. The 
Bessel ellipsoid, common in Europe, also 
specified radii somewhat different than 
today's best global estimates. 

Since the 1980s, improved measure- 
ments have yielded extremely accurate. 
global measurements. Ellipsoids such as the 
GRSSO provide a "best" overall fit to 
‘observed measurements across the globe, 
and are now widely used, 


Modern Datum Definition 


To create a datum, we must complement 
‘our accurate ellipsoid by establishing a lati- 
Tode longitude net on the Earth's surface, 
‘and estimating the coordinates for a set of 
precisely measured datum points in this net. 
Historically. we've established our latitude 
longitude system relative to far-off stars. We 
use celestial bodies because their relative 
Positions are invariant through time: the. 
‘angle measured between two stars is 
unchanged over the course of human history. 
By making precise measurements to stars 
бош different locations on Earth we can 
establish the relative position of points on 
the Earth's surface. Our zero longitude. 
passes near the Greenwich Observatory 
because it was the first place with a large. 
enough catalog of celestial measurements to 
calculate latitudes and longitudes over much. 
of the Earth. 

Establishing latiude longitude lines and 
measuring point coordinates is complicated. 
because the Earth system moves in multiple 
dimensions. The Earth spins оп its axis, with 
Some precession or “wobble.” while at the 
same time orbiting the Sun. Although 
unknown prior to the1 960s, large tectonic 
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plates slide, twist, and tilt through time over 
the surface of the Earth, so any object fixed 
to the surface of the Earth moves relative to 
most other fixed objects. 


Before we discovered plate tectonics, 
‘we assumed the continents were stationary 
and that we could establish а latinade/longi- 
tude grid, fixed for all time with a zero lon- 
gitude passing through the Greenwich 
Observatory. Most points were considered 
static, and the main problems were thought 
to be improving measurement density, accu- 
тасу, and coherence across all continents. 
‘The discovery of large plate movement com- 
plicated this notion. Figure 3-6 illustrates 
point movement across time periods, called 
‘epochs in geodesy. It shows a continuously 
operating, fixed GPS station's drift through 
time, inthis сазе moving more than a meter 
(3 feet) over 26 years. GIS data collected 
‘with an unknown epoch over this period 
could diverge by over three feet from the. 


Point Position, Epoch 
and Shift through time, 
Trak Point, Orange 
‘County, CA 


106 cm oF 
затна 
NAQOS TW 


same point measured at a different time. 
Mis-aligament will be greater for future 
‘measurements, and any data collected at dif- 
ferent times won't overlay correctly. Digi- 
tized locations of power poles might end up 
thre feet into a road in an independent 
streets layer, or buildings might appear to be 
in rivers. 

Given plate movements, how do we 
establish a coherent datum framework for 
measuring locations across the globe? Most 
countries are adopting the concept of 
dynamic datums and fixed epoch position- 
ing. where we keep track of the epoch of 
coordinate measure зо фа! data may be 
transformed to a common time. We measure 
the location of the continents through time to 
millimeter accuracies and we can calculate 
the shift of coordinate locations from any 
one time period to another usi 
models. These models are based on a series 
of measurements of the speed and direction 
of drift fora network of points. As with dead 
reckoning. we can extrapolate future posi- 
tion based on past velocities and directions, 
filling in gaps between point re-measure- 
ments. Our data can then overlay to within 
the accuracies of our measurements, as 
we've removed most of the positional differ- 
ence due to tectonic plate movements 

Most modem datums are at their base 
dependent on a global measurement infra- 
Structure that consists of four main systems. 
(Figure 3-7): Very Long Range Baseline 
Interferometry (VLBI, Satellite Laser 
‘Ranging (SLR), Doppler Orbitography by 


Radiopositioning Integrated by Satellite 
(DORIS), and Global Navigation Satellite 


Systems (GNSS). These systems allow us to 
precisely locate point locations and shifts on 
Tectonic plates through time. 
VLBI may be viewed as the base mea- 
surement in establishing our datums. A 
global network of 30+ radio telescopes 
receive signals from distant quasars (Figure 
3-7). The radio signals vary so that a specific 
pattem released at t by the quasar may be 
Fecorded at different times across the globe 
( through t). Correlated signal analysis 
allows calculating the relative position 
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Fundamental Measurements for the 


ITRS/ITRF Systems 


N 
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Figure 3-7: Systems used to establish terrestnal reference ames and hence datums. 


between these radio telescopes at any one 
time to within millimeters, or hundredths of 
inches, Further, VLBI and related technolo- 
gies can measure station velocities, both 
absolutely, and between any one station and 
any other. This provides highly accurate 
measurements of the Earth's rotation and. 
variation, the orientation of the polar axis, 
the relative positions of the stations on the 
continents, and crustal movements, again, all 
at millimeter accuracies. 


These positional measurements are aug- 
mented by other systems, most notably 
DORIS, GNSS, and SLR. The DORIS sys- 
tem measures Doppler shifts in signals sent 
to satellites, which allows us to precisely 
locate approximately 100 ground stations. 
GNSS measure signals broadcast from con- 
stellations of satellites to networks of hun- 
dreds of fixed receivers stationed across the. 
globe, quantifying station location through 
time, The SLR measures laser travel times to 
and from orbiting satellites to estimate SLR 


station locations and movements, particu- 
larly vertical changes associated with crusal 
deformation. at near millimeter levels. 


Taken together, these measurements are 
the backbone of the International Terrestrial 
Reference System (ITRS), and allow the cal- 
‘ulation of the recognized global datums 
known as the International Terrestrial Refer- 
ence Frames (ITRF), mentioned earlier. 
These reference frames define the precise 
locations and velocities of stations as datum 
points relative to a standard latitude/longi- 
tude net, at a given point in time (Figure 3- 
$). This last qualification is important, that 
the points are only specified at a given point 
in time, or measurement epoch, because the 
system is dynamic: the stations аге all mov- 
ing eave w each otter. We can сану 
the velocities and directions of each mea- 
surement point, in effect quantifying conti- 
nental drift (Figure 3-8). We can 
approximate where any point was at a previ- 
‘ous time by applying these velocities to posi- 


34: Teconie plates for a portion of the Earth, aod a whet of weaved 
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tional measurements, subtracting to adjust 
for intervening drift. ITRF datums are calcu- 
lated periodically, and include estimates of 
both the positions and velocities of all points 
so that we may combine data across different. 
‘measurement epochs. 

Different versions of the ITRF are noted 
by their year of calculation, for example, 
TTRF89, ITRE90, ITRF91. Each includes 
the X, Y, and Z location of each measure- 
‘ment station and the velocity of each station 
in three dimensions. Before VLBI and 
related technologies, particularly GNSS, we 
had no accurate way to widely. inexpen- 
sively account for tectonic coordinate shifts. 
GNSS changed that, and we must now 
record both the location and the time of mea- 
surement if we are to precisely establish 
positions within a datum, and to compare. 
locations through time. 


NAD83(2011) 


The NADS320LI) is the most recent, 
official horizontal datum in the U.S. Itis 
based ona comprehensive analysis of GNSS 
data, combined with continent-wide surface 
surveys. Central to the datum are measure- 
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ments from fixed GNSS measurement instal- 
lations, e.g. the U.S. Continuously Operating 
Reference Stations (CORS, Figure 3-9), а 
network of several hundred fixed GPS sta- 
tions distributed across the North America, 
or the Canadian Active Control Stations 
(CACS) in Canada, where tbe datum is 
known as NAD83(CSRS). NADS3C011) is 
based on the GRS80 ellipsoid. but the center 
of the ellipsoid is offset by more than a 
meter from our best estimate of the Earth's 
center of mass. This was adopted to maintain 
consistency with preceding datums, and to 
reduce additional computations and confu- 
sion among users. 

TheNADS3(2011) includes recalculated 
locations of tens of thousands of legacy 
“passive” control points, the traditional start- 
ing points for most local surveys, These sur- 
vey marks were monumented to serve as 
local starting points, or bench marks when. 
part of precise vertical surveys. Marks often 
consist of a metal disk embedded in rock or 
concrete (Figure 3-10). Their coordinates. 
and other characteristics are maintained, 
often in an online-accessible database. The 
median network accuracy for control points 
was reported as better than 1 cm. 


‘There area seris of horizontal datums 
released for North America between 1989 
‘and 2007 that were precursors to 
NADS3(201 1), remarkable given that almost 
sixty years passed between prior datum 


‘updates. This is because GPS became widely 
available toward the end ofthe 1983 update. 
The GPS data had higher accuracies than 
those measured using traditional methods, 
‘with improved speed and lower costs. How- 
ever, a nationwide integration of existing 
Surveys and new GPS observations was 
‘untried, with unknown cost and speed. In 
response, NGS collaborated with states to 
rente limited area High Accuracy Reference 
Networks (НАК), also known as High 
Precision Geodetic Networks (HPGN), Gen- 
‘rally, there is a different NADS3(HARN) 
foreach state or small groups of states. The 
HARN and subsequent NADS3 nationwide 
adjustments are largely satellite-based, and 
mark the transition from physical and optical 
Surveying to GPS/GNSS surveying. As the 
GPS-CORS network was extended, it 
allowed more frequent datum improve- 
ments, and geodesists assimilated the new 
data and methods into a coherent continental 
analysis, over a series of datums: 
NADS3(CORS93), NAD83(CORS94), 
NADSCORS96), NADS3(NSRS2007), 
and finally the NADS3(201 1) we use today. 
Note thatthe NGS and others sometimes use 
alternative designations with only the datum 
date, e.g., NADS3(1996) is equivalent to 
NADSXCORS96). 


NATRF2022 


‘The United States is transitioning to a 
set of satellite-based, dynamic reference 
frames, called the North American Terres- 
trial Reference Frame of 2022 
NATRE2022). This system is substantially 
different from the NAD83(2011), and differ- 
‘ent from but related to the ITRF. As with the 
ITRF, the NATRF2022 also integrates time- 
dependency in datum points, and is based in 
рап on ITRS measurements to determine 
Positions. The system identifies datum. 
points with epoch coordinates, reporting 
point locations for a specific date time, and 
calculates periodic updates for the point 
Coordinates, e.g.. at five or ten year dates. 
Practitioners may use epoch coordinates asa 
basis for subsequently positioning surveys or 
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‘may adjust to a standard epoch. 


NATRF2022 will also identify continu- 
‘ously measured datum points, providing 
near real-time coordinates and velocities for 
а set of locations. Most of these locations 
will coincide with CORS base stations. If 
using appropriate GNSS/GPS technologies, 
new field data may reference the CORS base 
stations directly, calculating positions for the 
current time and date. The epoch may be 
recorded in the metadata, and positions may 
be easily adjusted to an epoch corresponding 
to other data, 

The NATRF2022 and ITRF coordinates. 
typically are not the sume, although they 
‘both at their base include measurements 
from the International Terrestrial Reference. 
System, The ITRF uses a larger set of sta- 
tions to estimate continental plate dynamics. 
and average the plate movements, and hence 
the two systems use different relative point 
velocities. The U.S. NGS adopts а conven- 
tion that reduces coordinate velocities in 
North America, and hence lengthens the 
interval between required coordinate recal- 
culation. Some background helps understand 
this choice. 


Fall our modem datums held the longi- 
tude fixed for a zero line through the Green- 
‘wich Observatory, then all other point 
coordinates across the globe would drift with 
"he continents, some quite rapidly, and in 
different directions. Drift would be lower in 
England, and largest for the plate moving 
fastest relative to England. This wouldn't be 
the best choice for most countries, because it 
‘would require more frequent calculations of 
‘new positions. For example, in projects 
where accuracies of 30 em would suffice, 
with a Greenvich-fixed 0 longitude, the con- 
tinental drift would surpass common accu- 
тасу threshold in a decade or less, and data 
‘will mis-align much sooner for much of the 
globe. We could back-calculate to a standard 
epoch so that data matched, but would have 
to recalculate any time data were collected 
‘more than a few years apart, increasing work 
and the chance for confusion or errors. 


For this and other reasons, neither the 
ТТЕР nor the NATRF2022 maintain а zero 
longitude through the Greenwich Observa- 
tory. Rather, both the NATRF2022 and ITRF 
systems adopt latitude longitude nets tied to 
celestial frames. These latinde Longitude. 
nets have an origin at the mass center of the 
Earth spin around the pole, and orbit the 
Sun according to precise measurements and. 
‘models. We might think of these as imagi- 
mary wire frame of latitude and longitude 
lines that are tied to the mass center of the 
Earth. with the tectonic plates sliding under 
the wire frame. The zero longitude passes 
near, but not through the Greenwich Obser- 
улогу, and this longitude drifts slowly 
through time, asthe plate containing the 
Greenwich Observatory drifts under our ati 
tude longitude frame, All points will drift 
with their plates. 

‘A main question is to what do we attach 
the latitudelongitude wire frame? The larg- 
est NATRF2022 datum is tied to the main 
North American Plate, meaning that in а sta- 
tistical sense the net movement for a set of 
points distributed across the North American 
late averages to zero. Points on this plate 
will move, but by small amounts, and bal- 
‘ance each other out across the plate. This. 
yields relatively slow velocities for points on 
the main plate, yielding measured coordi- 
nates that have small epoch differences for 
longer periods. There is a less frequent need 
to adjust NATRF2022 coordinate locations 
оп the main North American plate to account 
for continental drift. Under this scheme, 
coordinates on other plates move at rela- 
tively higher velocities, but the NATRF2022 
is designed for US. territories. 

The ITRF datums are different. Move- 
ments are averaged globally, so the summed 
movement of a set of points distributed 
across all the Earth's large plates is zero. 
This means that even though the ITRF and 
NATRF2022 may initially lign. they vill 
diverge over time. At some time in the 
future, the government bodies will have to 
adjust for this divergence, but that is pro- 
jected to be several decades hence, given the 
estimated rates of divergence. 
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ETRS89 


The European Terrestrial Reference Sys- 
tem of 1989 (ETRSS) is the most common 
Positioning base over much of Europe, and 
the origin for many national datums on that 
continent. It takes a parallel approach to the. 
NATRF2022, in that tbe system is tied to the 
main European tectonic plate, and is based 
on ITRS measurements at epoch 1989.0. 
‘This maintains low coordinate velocities 
across this main European plate. As with 
NATRF2022. this means time between 
adjustments for continental drift is extended 
when working on the European plate. Coor- 
dinate shifts measured by ITRS may be up to 
centimeters a year, while in the ETRSS9 
they are typically millimeters per year. Data 
may be transformed to a common epoch so 
that they overlay correctly 

There are several updates tothe 
ETRSS9, with new and refined coordinates 
‘measured in 2000, 2005, and 2008, and there 
will likely be periodic additional adjust- 
ments, These are based on the ITRS and the 
EPN. a permanent network of GNSS stations 
spread across Europe. in much the same way 
пз the CORS stations are the basis for posi- 
tioning in North America. By convention, 
the updated datums are labeled. 
ETRF2000(Rxx), where xx is the year of 
adjustment. 


WGS84 Datums 


‘The World Geodetic System of 1984 
(М0584) is a third set of datums, developed 
and primarily used by the U.S, Department 
of Defense (DOD). It was introduced in 
1987 and is used in most DOD maps and 
positional data. The башт is periodically 
updated, eg... WGS84(1674), and 
WGS84(G1762), and there will likely be 
more adjustments in the future. The ITRF 
and WGSS4 datums have been aligned since 
1995, and сап be considered equivalent for 
most, but not all, GIS applications. Differ- 
ences between these datums are generally 
only few centimeters since 1995. 


Legacy, Pre-Satellite Datums 


While many countries have or are in the 
process of transitioning to satellite-based, 
dynamic datums as described above, some 
new data and most existing data use coordi- 
nates referenced to older, legacy datums. 
GIS user should be familiar with both new 
and old datums until we convert ай our old 
data and collect all our new data in the new 
dynamic datums, 

Geodetic surveys in the 18th and 19th 
‘centuries combined ground surveys with 
‘optical astronomical observations to develop 
datums. Figure 3-11 shows an example sur- 
vey. employing a network of interlocking tri- 
angles to determine positions at survey 
stations. Triangulation surveys relied оп 
‘optical angle measurement and few surface 
distance measurements, an advantage given 
surface measurements were difficult and 
‘expensive when many datums were first 
developed. Triangulation also improves 
accuracy; because there are multiple mea- 
surements to each survey station, the loca- 
tion at each station may be computed by 
various paths. 
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Although rarely encountered today, 
there was a U.S. Standard Datum in use up 
to the early 20th century replaced by the 
North American Datum of 1927 (NAD27)- 
‘The NAD27 vas the first newock-wide least 
squares adjustment, fixing a survey station 
in Kansas as a starting рош. 

Its successor was the North American 
Datum of 1983 (NADS3). first released in 

1986. Because there were several rapid 
‘updates due to the rapid adoption of GPS, 
we place a modifier in parentheses after the 
NADS designator. NADS3(1986) indicates 
the original version. the datum adjustment 
with limited GPS observations. The original 
NADS3(1986) used an Earth-centered refer- 
ence, rather than fixing a surface station as 
with NAD27. Coordinate shifts from 
NAD27 to NADS3(1986) were large, often 
tens to hundreds of meters. In most 
instances, the coordinates changed because 
‘our measurement methods and number of 
stations improved. Asdescribed above, there 
жеге several iterations of the NADS3 
datums over the Following decade. 


Examples of Datum Shifts 


Datum Shifts 


Almost ай locations have different coor- 
dinates when specified in different datums. 
The latitudes and longitudes change because 
newer datum use different, usually improved 
data, methods, and models. Objects appear 
to shift datums and mis-align across datums, 
sometimes significantly so. 

Figure 3-12 illustrates datum shifts at а 
wwell-measured NGS mark. The ITRF/ 
NGSS4 and current NAD&3 datum coordi- 
nates can differ by as much as two meters, 
even though they are both based on modern. 
satellite and other accurate measurements. 
‘Any analysis with accuracy requirements 
Smaller than the datum shift will be unreli- 
able. Our underlying datum shifts are larger 
than our acceptable error specification. We 
should be careful in converting all data tothe 
same datum when combining data from dif- 
ferent sources 

Notice that the datum shift between the 
legacy NAD27 and NADSA(1986) is quite 
large, approximately 40 meters (130 feet), 
typical of the up to hundreds of meters of 
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shifts from early. regional datums to modem, 
global datums. The figure also shows the 
subsequently smaller shifts for NADS3 
datums through time, and relatively larger 
distance between NADS3 and WGS84TTRF 
datums. The figure doesn't show the rela- 


ered equivalent for many purposes. The 
 NGSS4(G730) was aligned with the ITRF92 
datum. so these may be substituted in datum. 
transformations requiring no better than cen- 
timeter accuracies. Similarly, the 
WGSSA(GI150) and ITRFOO datums have 


tively large shifts from the NADS3(Q011)to been aligned, and may be substituted in most 
the NATRF2022, but is expected to be simi- subsequent transformations. 
әмин WOES, "While locations in the NADS3(xxxx) 
‘Datums shifts associated with datum and the ITRF/WGSS4 datums commonly 
transformations have changed with each suc- — differ by over a meter, datum shifts internal. 
cessive datum realization, as summarized in to these groupings have become small for 
Figure 3-13. Several datum pairs are consid- — recent datums. Differences between 
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NADS3(HARN) and NADS3(xxxx) datums 
шау be up to 20 cm, but are typically less 
than 4 cm, so these datum realizations may 
be considered equivalent if accuracy limits 
are above 20 cm. The differences between 
NADSS(CORS96) and NADS3(2011) are 
often a few centimeters, as are the differ- 
ences among ITRF realizations. 


We must emphasize a few points about. 
datums. First, different datums specify dif- 
ferent coordinate systems. You do not expect 
coordinates for any physical point to be the 
‘same when they are expressed relative to dif- 
ferent datums. 


Second, different versions within a 
datum family are different datums. 
NADS3(1996) is a different realization than 
NADS3QOLL), and ITRF8S is different than 
ITREOS, The datum is incompletely speci- 
fied unless the version is noted. Many GIS 
data sets refer to a datum without the ver- 
sion, for example, NADS3, This is indeter- 
minate and confusing, and shouldn't be 
practiced. It forces the user to work with 
ambiguity, which leads o blunders. 

Third. coordinates specified relative to 
officially adopted datums in most countries. 
change through versions. The NATRF2022 
positions diverge from NADS3(2011) posi- 
tions by 1 to 1.5 meters for most points in 
North America. The NADS3(1986) datum 
realization is up to two meters different than 
the NADSXCORS96) Differences in datum 
realizations depend on the versions and loca- 
tion on Earth. Data may not overlap cor- 
rectly for different datum versions, even if 
you use standard datums established for a 
‘country. 

Finally. ай data layers used in an analy- 
sis should be converted to the same datum. 
"unless you are certain that the shifts between 
datums are smaller than the accuracy 
requirements for your analysis. You don't 
need to convert all NADSCORS96) data to 
the NADS3(2011) datum if your accuracy 
requirements are 1.5 meters, as the datum 
shift is typically a few centimeters, but you 
do need to convert NADSXCORS96) data to 
ITRF(2008) when applying the same error 


threshold. You should verify the magnitude 
of datum shifts between any datum pair, and 
apply a datum transformation if the shift 
approaches your error tolerance. 


Datum Transformations 


‘Converting coordinates from one datum 
to another is typically done using a datum. 
ition. A datum transformation 
provides the latitude and longitude ofa point 
in one datum when we know them in another 
datum: for example, we can calculate the lat- 
‘tude and longitude ofa survey mark in 
NADS3Q011) epoch 2020.00 when we 
know these geographic coordinates in 
TTRFOS epoch 2020.0 (Figure 3-14). 


Datum transformations are often more 
complicated when they involve older 
datums. Many older datums were created 
piecemeal to optimize fit for a country ог 
continent, so simple formulas often do not 
exist for transformations among many older 
datums. Specialized datum trans formations 
may be provided, usually by government 
agencies. As an example, in the United 
‘States, the National Geodetic Survey created 
NCAT, a datum transformation tool to con- 
vert between various NAD datums. 


‘Transformation among newer datums 
may use more general mathematical trans- 
formations between three-dimensional, Car- 
tesian coordinate systems (Figure 3-14), 
‘Transformation equations allow conversion 
among most NADS3, WGS84, and ITRF 
geographic coordinate systems, and are sup- 
ported in large part by the improved global 
‘measurements described in the previous few 
pages. This approach incorporates a shift in 
the origin. a rotation, and a change in scale 
from one datum to another. 

А datum transformation is typically а 
multi-step process. In past times, empirical, 
grid-based methods have been used because 
‘many early datums were not strictly derived 
from coherent mathematical surfaces. Later, 
a Molodensky transformation was common. 
using a system of equations with three or 
five parameters. More currently, a Helmert 
‘transformation is employed using seven or. 
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14 parameters (Figure 3-14). First geo- 
‘graphic coordinates on the source datum are 
converted from longitude (A) and latitude 
(6) to X. Y. and Z Cartesian coordinates. An 
origin shift (translation). rotation, and scale 
are applied. This system produces new X. 
У. and Z coordinates in the target datum. 
‘These X. Y. and Z Cartesian coordinates are 
then converted back to the longitudes and 
latitudes ( and ¢) in the target datum. 
More advanced methods allow these. 
seven transformation parameters to change. 
through time. to account for tectonic and 
other shifts, for a total of 14 parameters. 
These methods are incorporated into soft- 
ware that calculate transformations among 
modern datums, for example, the Horizontal 
Time-Dependent Positioning (HTDP) tool 
available from the US. NGS 
(wrww.ngs.noaa gov TOOLS Htdp! 
Htdp shtml). HTDP converts among recent 
NADS3 datums and most ITRF and WGS84 
datums. 
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Because of tectonic plate movement, the 
most precise geodetic measurements refer to 
the epoch, or fixed time period, at which the 
point was measured or datum fit. The HTDP 
Software includes options to calculate the 
shift in а location due to different reference 
datums [for example, NADS3(CORS96) to 
NGSS4(GI150)) the shit due to different 
realizations of а datum [for example, 
NADS3CORS96) to NADS3(2011)}, the 
shift due to measurements in different 
‘epochs [for example, NADSNCORS96) 

1997.0 to NADSCORS96) 
оран певао toa me 
factors. 

‘Summarizing key insights, datums rep- 
resent our best estimate of well-measured 
coordinate locations at a given time, and 
coordinates in one datum may differ from 
coordinates in the previous datum for the 
Same points. These differences may be small 
‘and ignored with little penalty in some spe- 
cific instances, typically when the changes 
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between datums are smaller than the spatial 
accuracy required for our analysis. However, 
‘many datum shifts are quite large, up to tens 
of meters. One should Know the magnitude 
ofthe datum shifts for the area and datum 
transformations of interes. 


Second, we should use the appropriate. 
datum transformation. There is no generic 
transformation between NADS3 and 
30584. Rather, there are transformations 
between specific versions of each. for exam- 
ple, бош NADSS(CORS96) to 
 WGSSA(1150). Our default practice should 
be to use the proper datum transformation, 
for the proper version, to transform all data 
tothe same datum. 


Finally, GNSS positioning is good 
enough that we can measure the change in 
position over relatively shor time periods, 
зо the same feature measured at two differ- 
ent dates, even with the same datum, will 
have two different sets of coordinates. 
Unless proven otherwise, all data should be 
converted to the same coordinate system, 
based on the same datum, at the same epoch 
date, or time period. We must note the epoch 
of data collection, and convert ай data to a 
standard epoch, unless we know that the 
coordinate drift is less than accuracy needs. 
10 not, data may misalign. We should record. 
the original datum under which the data 
жете collected, and the type and method of 
any datum transformation applied tothe 
data. 
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Heights 


The Geoid 


We often need to specify the elevation of 
objects. e.g.. to measure slope, drop. or visi- 
bility. We adopt a gravity-related vertical 
reference called a geoid з а starting point 
for measuring heights (Figure 3-15). We 
don't use Ше ellipsoid as a height reference. 
because Earth density and hence gravity 
variations cause regions to dip below or. 
bulge above a reference ellipsoid. The 
world's oceans, our historical reference for 
approximately follow a gravitation- 
defined geoid surface. 

The geoidal surface may be thought of 
as an imaginary sea that covers the entire 
Earth affected only by Earth's gravity. This 
geoid rises above or dips below the Earth's 
ellipsoid by up to 100 meters across the 
globe (Figure 3-16). Although it may at first 
seem difficult to believe, the "average" 
ocean surface near Iceland is more than 150 
meters "higher" than the ocean surface 
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northeast of Jamaica when measured relative 
to our mathematical ellipsoid. Since gravity 
pulis in a direction that is perpendicular to 
the geoidal surface, and the pull of gravity 
near the ellipsoid surface is stronger in some 
areas than others, these geoid variations rep- 
resent persistent bulges and dips in the mean 
‘ocean height above and below our mathe- 
‘matically-defined ellipsoid. Variation in 
‘ocean heights due to swells and wind-driven 
waves are more apparent at local scales, but 
эге much smaller than the long-distance 
geoida) undulations, 


We define a geoid using a three-dimen- 
sional equipotentíal surface. along which the 
pull of gravity is a specified constant, ог 
more precisely, the gravitational potential 
‘energy is at a Constant, We refer to an equal 
pull of gravity when describing the geoidal 
‘equipotential surface, а familiar non-mathe- 
matical description. Geodesists use the more. 
precise, physically defined term gravita- 
tional potential, related to the amount of 
‘work required to move an object against the 
Earth's gravitational force. Pure water ina 
lake or ocean, absent wind or other forces, 
will have a surface that conforms to а gravi- 
tational equipotential surface. Absent these 
other forces, the gravitational force "levels" 
the water 


The geoid is a measured and interpo- 
{sted surface: unlike an ellipsoid. the geoidal 
surface is not defined by a simple mathemat- 
ical equation. We use a number of methods 
то measure the geoid, historically with rigo- 
пошешс leveling. and more in modem times 
with various types of gravimeters (Figure 3- 
17). These devices measure the absolute or 
relative gravitational force, and are placed or 
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carried on or above the Earth's surface. 
Recent increases in airbome gravimetry and 
various satellite systems have substantially 
improved our measurements of the geoidal 
surface. 


Because we have two reference sur- 
faces, a geiod and an ellipsoid, we also have 
two different kinds of heights. Elevation is 
typically defined as the distance above a 
geoid, often called height above mean sea 
level, but more unambiguously referred to as 
the orthometric height (Figure 3-18). 
Heights above an ellipsoid, or elipsoi 
dal heights (or height above ellipsoid, HAE), 
are used in some coordinate system calcula- 
tions and for some global navigation systems 
such as GPS, but ellipsoidal heights are not 
ош standard way to measure elevation. 
These heights are illustrated in Figure 3-18, 
with the ellipsoidal height labeled h and 
onhometric height labeled H. The difference. 
between the ellipsoidal height and orthomet- 
Tic height at any location is shown in Figure 
3-18 as М, and has various names, including. 
geoidal height and geoidal separation. 
Orthometric heights are defined as the 
vertical distance measured from a reference 
geoid to the ground surface, along а line that 
is atright angles to intervening equipotential 


orthometric. XE 


surfaces (Figure 3-19). This beight line may 
bend due to small undulations in the equipo- 
tential surfaces. The height pats are not the 
same as a straight line normal to the ellipsoid 
‘and up to the surface, and not the same as a 
straight line that is normal to the geoid sur- 
face at the starting point, but the differences 
berween these paths are usually quite small. 


We emphasize that a geoidal surface 
usually differs from the average sea height 
measured at any one location. A local mean. 
sea level may be higher or lower than a 
global mean sea level because non-gravi 
tional forces such as persistent winds, ocean 
‘currents, temperature. and salinity variations 
‘can cause persistent locally high or low areas 
in бе oceans. These non-gravitationa differ- 
‘ences in height can be up to ameter (3 feet) 
perhaps small on global scale, but significant 
for local or regional measurements. Our 
geoid follows a gravitational equipotential 
Surface which best matches, in а least 
Squares sense, mean sea level when aver- 
aged across the globe. The local, long-term 
average sea height is different from the 
global spatial averaging of sea height 
defined by a geoid at many locations. 


‘equipotential surfaces 
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Vertical Datums and Heights 


‘To specify heights we define a reference 
surface called а vertical datum, and specify 
orhometric heights relative to that surface. 
Modem vertical datums typically identify a 
specific geoid, although they may choose 
surfaces that average across regions, so do 
strictly follow the geoid surface everywhere. 
‘We measure gravity to estimate the specific 
geoidal surface, and carefully measure orth- 
metric heights on the ground surface. For 
‘example, Canada and the U.S. defined the 
equipotential surface at a gravity value of 
Wo 626365560 m's? 
as the reference for their future vertical 
datums, and strive to measure this surface 
with sufficient accuracy to specify heights 
across North America. 
Governments have often adopted 
rid” geoids that combine their own pre- 
cise vertical surveys with gravity measure- 
‘ments and models, primarily due to historic 
gaps in gravimetric measurements, and also 
To match long-term practice. In mainland 
Australia, heights are relative to measure- 
ments averaged over 30 tidal gages spread 
along the coast, because they have an 
approximately 1-meter decline in tbe geoid 
height relative to tidal gage measurements 
from the northeast to tbe southwestern part 
of the country. Their vertical datum is 
related to be deviations from a specific 
geoid. Various European countries adopt 
base points near different long-term tidal 


zero height may differ, you must be careful. 
‘when mixing standard heights across coun- 
ше. 


The most current datum in the U.S. and 


official vertical datum. They combine past 
trigonometric leveling surveys (Figure 3-20) 
along with current gravimetric measure- 
‘ments to establish a vertical reference. Trig- 
 onometri leveling starts at a known. 
elevation, often a seaside bench mark. Dis- 
tances and elevation differences аге pre- 


cisely measured from there to other points. 
Vertical angles and surface distance mea- 
surements are used to estimate horizonatal 
and vertical distances at a series of points. 
Over short distance, the relative heights can. 
be measured with millimetric accuracies, 
Tens of thousands of kilometers of vertical 
leveling over more than a century have pro- 
vided the relative heights for thousands of 
points within North America, 


However, because leveling is difficult to. 
adjust for gravimetric variation between sta- 
tions, and because gravimetric measure- 
ments have historically been sparse, long- 

istance leveling networks provide relatively 

imprecise orthometric heights over the entire 

network. Recent improvements in measure. 
‘ments and analysis have revealed continental 
biases in height measurements and geoidal 
surface estimates in North America, leading 
to current efforts to increase the density and 
quality of gravity measurements to better 
specify the geoid. 

The current reference geoid, named 
 GEOIDIS, combines GNSS data, satellite 
gravimetry. and airborne measurements with 
some trigonometric leveling data. The 
NASA GRACE mission, launched in 2002, 
measured the distance between a pair of sa 
elites as they orbited the Earth, The satel- 
lites were pulled closer or drifted farther 
from the Earth due to variation in the gravity 
field, and precise inter-satellite distances 
allow highly accurate estimates of gravity. 
Because the orbital path changes slightly 
each day, over time we obtained nearly com- 
plete Earth coverage. The ESA GOCE satel- 
lite launched in 2009 used precision 
accelerometers to measure gravity-induced 
velocity change. Combined, the GRACE and 
GOCE observations substantially improved 
ош estimates of the gravitational field and 
geoidal shape, averaged over large areas. 
Short-distance variation in gravity fields is 
provided by surface or airborne measure- 
ments. In the U.S., the GRAV-D measure- 
‘ment campaign quantified local anomalies 
by flying gravimeters over sparsely mea- 
sured or highly variable regions. Together 
with surface gravimetry, we've substantially 
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improved our mapping of gravitational sur- 
faces in the past 20 years. These measure- 
ments led to a series of geoid estimates and 
vertical datum improvements, with the most 
current named the North American Vertical 
Datum of 1988 (NAVDSS), based on 
GEOID2018. 


As with horizontal datums, there are 
various “legacy” vertical datums, no longer 
standard, but important to know about 
because heights may be referenced to these 
datums. The National Geodetic Vertical 
Datum of 1929 (NGVD29) was the first 
widespread datum in North America, based 
almost entirely on trigonometric leveling 
from seaside benchmarks. This was replaced 
by the first version of NAVDSS, using 
hybrid geoid named GEOIDO3 and associ- 
ated with the NADS 1996) horizontal 
datum. and has been subsequently improved 
to the current GEOIDIS. Each successive 
improvement integrated increasingly dense 
and accurate gravimetric measurements with 
improved horizontal measurements and 
analysis, primarily from better GNSS. 


Care should be taken when combining 
height data referenced to different vertical 
datums, particularly due to multiple geoid 
"updates near the turn of the 21st century. 
Vertical datums are typically developed in 
‘concert with horizontal datums, зо we 
should pair our horizontal vertical datums 
‘when combining converting coordinate data 
(Figure 3-21), that is, we should convert our. 
horizontal data to a standard datum, and our 
vertical data to the corresponding vertical 
datum. 


At this writing, the U.S. is estimating а 
new modem global geoid, named 
 GEOID2022, defining the geopotential field 
that best approximates global mean sea level 
at a consistent data collection epoch, In addi- 
tion. three refined, regional. gridded models 
will be fit for North America and overseas 
territories. These are called models because 
‘we measure geoidal heights at points or 
along lines at various parts of the globe, but 
we need geoidal heights everywhere. Equa- 
tions are statistically fit to relate the mea- 
sured geoidal heights to geographic 
‘coordinates. Given any set of geographic 
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coordinates, we may then estimate the ortho- 
‘metric height. 

The GEOID2022 will be integral to a 
new North American vertical datum, called 
the North American-Pacific Geopotential 
Datum of 2022 (NAPGD2022). This work is 
in concert with developing the horizontal 
NATRF2022 datum, and will harmonize 
horizontal and vertical positioning. The new 
vertical datum will integrate extensive new 

jirbome gravity surveys of the entire U.S. 
‘with previously data to yield a geoid surface 
with estimate accuracies within 1 cm. 

‘The NAPGD2022 vertical datum will be 
а substantial improvement, but will signifi- 
cantly alter heights across North America 
(Figure 3-22). Newer, more accurate mea- 
‘surements revealed a continental bias in 
geoid heights tilting from over 1.2 meters (4 


NGVD29, no geoid 
with 
NAD27, NAD83(1986) 


NAVD88, GEOIDO3 
with 
NADB3(1996) 


NAVD88, GEOIDOS 
with 
NAD83(NSRS2007) 


NAVD88, GEOIDI28 
and GEOIDIB 

with 

NAD83(2011) 


NAPGD2022 
with 
NATRF2022 


3.21: Recommended pairing for bori- 
zontal and vertaal datums ш Nori Amena. 


fi) in Washington state to zero in Florida. 
There are also significant local anomalies in 
height due to measurement error and spar- 
sity, and geophysical dynamics such as post- 
glacial rebound, earthquakes, and sea level 
пзе. This means all heights will change, 
some by more than a meter (3 feet) when we 
adopt the new height datum. 

The NAPGD2022 isa conceptual shift 
in how we will measure heights going for- 
wards. Past practice measured heights from 
established physical benchmarks. GNSS 
systems such as GPS now allow us to rap- 
idly, accurately, and inexpensively deter- 
‘mine ellipsoidal heights at any point on the 
Surface of the Earth. When combined with 
an accurate model of geoidal heights, we 
‘may then use the method in Figure 3-18 to 
calculate orthometric height. Current tech- 
nologies will yield centimeter-level height 
оса quel, и relatively low con, 
everywhere. This reliance on GNSS will 
greatly increase the speed and accuracy of 
determining elevations, but is dependent on 
an accurate model such as that produced by 
the NAPGD2022. 


VDatum 

Given that vertical datums and associ- 
ated geoids change through time, the United 
States National Geodetic Survey (NGS) has 
created a tool, VDatum, to estimate conver- 
sions among vertical datums in the U.S, 
(Figure 3-23). VDatum calculates the verti- 
cal difference from one datum to another at 
any given horizontal coordinate location and 
height. Conversions are provided between. 
the 1929 and modern datums, between 
WGSSAITRF and NAVD datums, and 
between various ellipsoid versions within 
the NAVDSS datum. 

Because the vertical datum shit will 
vary asa function of position, altitude and 
longitude must be provided, and because the 
shift may also depend somewhat on eleva- 
tion, а vertical height entered. As shown in 
the example in Figure 3-23, the shifts can be 
quite large, particularly when converting 
between NAVD and WGS84/ITRF, and also 
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from NGVD 1929 to NAVDSS datums. The 
vertical datum shift typically changes slowly 
with distance, so one offset may be suitable 
for all height shifts over a few to tens of 
square kilometers. The amount of error and 
"safe" distance to span varies by region, so 
the magnitude of the transformation should 
be verified at several points across any new 
study area to see how broadly an offset may 
be used. 

VDatum may also be used to estimate 
shifts in height among geoid versions. New 
geoid surfaces have been estimated approxi- 
mately every three years since 1996 
North America, and heights at any given 
point will change between geoids. If heights 
relative to different geoids are to be com- 
bined, one set of heights must be adjusted to 
match the geoid of the other. This is typi- 
cally achieved by adding ап offset calculated 
from the models included in Datum. 


‘Asan example, I may have two eleva- 
tion data sets, both in Eureka, California, 
near а point with latitude 40.8019, longitude 


124.1636, and approximately а 10-meter 
height. One elevation is measured relative to 
the GEOID96 version of the NAVDS8, and 
the other using the GEOID12A version. 1 
‘cam use VDatum to calculate the vertical 
height shift due to this difference in geoids; 
at that coordinate and height, it estimates a 
31 cm, or approximately 1 foot, increase in 
height between these two geoids, This 
means I would have to add 31 cm to all my 
96 heights before combining them with my 
12A heights. 


Approximate predicted change from NAVD 88 to new vertical datum 


igre 2-2: Expected st ia igh when caging бш бе NAVIES 1 NAPODMOD (etes 
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Dynamic Heights 


We must discuss another kind of height, 
called a dynamic height, because it is 
important for certain applications. Dynamic 
heights measure the change in gravitational 
puli from a given equipotential surface. 
‘Dynamic heights are important when inter- 
ested in water levels and flows across eleva- 
tions, Points that have the same dynamic. 
heights can be thought of as being at the 
same water level. Surprisingly, points with 
the same dynamic heights often have differ- 
ent orthometric heights (Figure 3-24). To be 
clear, two distinct points at water's edge on a 
large lake often do not have the same eleva- 
tions; often, they are different orthometric 
heights above our reference geoid. Since. 
orthometric heights are our standard for. 
specifying elevation. this means water. 
ideed flow uphill relative to our standard 
height measurement, or as confusingly, a 


lake may have а diferent elevation on one 
shore than on the opposite shore. 


‘To understand why water may flow 
uphill (from lower to higher orthometric 
heights), it is important to remember how 
orthometric heights are defined. An ortho- 
metric height is the distance, in the direction 
of gravitational pull, from the geoid up to a 
point. But remember, the geoid is a specified 
gravity value, an “equipotential” surfac 
‘where the pull of gravity is at some specified 
constant. As we move up from the geoid 
toward the surface, we pass through other 
‘equipoteatial surfaces, each at a slightly 
‘weaker gravitational force, until we arive at 
the surface point. But these gravity surfaces 
are not always parallel, and may be more 
closely packed in one portion of the globe 
than another. 


‘There are two key points. First, water 
spreads out to level across an equipotential 


Figure 3-23: An example of the application of the vertical dtu transformation software ЛО мша. 
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surface, absent wind, waves. and other fac- 
tors. The water level ina still bathtub, pond, 
or lake has the same equipotential surface at 
one end as another. Gravity ensures this. 
Second, the equipotential surfaces are closer. 
Together when nearer the mass center of. 
Earth. As the equipotential surfaces con- 
verge, or become “denser.” the water surface 
seems to dip below our fixed orthometric 
height. 

Because water follows an equipotential 
surface, and because the Earth's polar radius 
is less than the equatorial radius, the ortho- 
metric heights of the water surface on large 
lakes are usually different at the north and 
south ends. For example, as you move far- 
ther north in the Northem те, the 
'equipotential surfaces converge due to the 
smaller polar radius with increasing gravita- 
tional pull (Figure 3-24). An orthometric 
height is a fixed height above the geoidal 
surface, so the northern orthometric height 
will pass through more equipotential sur- 


Identical orthometric heights 


H+ 12025 


Figure 3.24: An sation of bow dynamic 
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faces than the same orthometric height at a 
more southerly location. An orthometric 
height of the water surface at the south end 
ofthe lake will be higher than at the north 
end. For example, in Lake Michigan, a large. 
lake in North America, the elevation of the 
the water surface at the south end is approxi- 
mately 15 cm higher than the elevation of 
the water surface at the north end. 


Dynamic heights are most often used 
when we're interested in relative heights for 
water levels, particularly over large lakes or 
‘connected water bodies. Because equal 
dynamic heights are at the same water level, 
we use them when interested in accurately 
representing hydrologic drop, head, pres- 
sure, and other variables related to water lev- 
els across distances. But these differences 
‘can be confusing when observing bench 
mark or sea level heights, and underscore 
again that our height reference is not mean 
sea level, but rather an estimated geoidal 
surface. 
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Local Sea Level Datum for Seattle, Puget Sound, WA 
All figures in feet, above local datum (left scale) » 


‘oF NAVDBB (right scale) 
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Local Sea Level Datums 


Water height measurements along the 
USS. coast are typically reference to local sea 
level datums. A5 noted earlier, mean sea 
level is not zero for almost all points along. 
North America’s coastline, Elevations are 
‘measured relative toa geoid. Zero elevation 
coincides with zero mean sea level at only 
опе standard coastal station in Canada, near 
‘the center of the continent. Variations in 
gravity, currents, salinity, tides, and wind 
produce mean sea levels that are diferent 
from zero by up to several meters (10s of 
feet) around the rest of the continental ги 
But we still need to know the ocean level 
along the coastline for many practical pur- 
poses, including construction, flood protec- 
tion, and water management. We have 
established a network of long-term, refer- 
ence measurement stations along the coast- 
line. We precisely measure both sea level 
and the station orthometric height so that we 
сап tie ош standard elevation to local water 
heights. 


Local Meon High Water - 10.49 ft 


| —— Local Mean Low Water - 283 ft 
[~ NAVDES О height - 2:34 ft above local O 


ко, 
ee 


p Local Mean Low-Low Water - O height ` ol 234 


чэ level and other memures ма NOAA long-term tidal gage. 
ЕТИШ E 


Data for measured tidal stations are 
available from the NOAA web page: 

‘itps:/tidesandcurrents.noaa.gov 

These sites report mean sea level, as well as 
mean high, low, and extreme water levels 
(Figure 3-25), Most importantly. they also 
report the NAVDSS ortbometric heights for 
each tidal station, allowing a conversion 
from local sea level eights to standard orth- 
‘metric heights used to specify elevation. 
Figure 3-25 shows data for а station in 
‘Seattle measured since 1899. Mean sea level 
has a local reference height of 6.64 feet, 
‘meaning the sea level averages that height 
above the long-term measurement of a given 
low water height. The NAVDSS height at the 
same point is 2.34 feet, which yields a sea 
level height of 6.64 -2.34, or 4.3 feet. AS 
strange as it may seem at first, the mean sea 
level at this Seattle station has an elevation 


of 43 feet. Any point nearby that has an ele- 
vation less than 43 fet will be below sea 
level, and will likely flood frequently if there 


В access to the ses. Local construction, 
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water level measurements or other activities This mean sea level offset varies by 

dependent on sea level will reference this location, for example, Port San Luis, CA, 

station measurements, and the 4.3-foot off- has a vertical offset of 2.7 feet, and Vaca 

‘set between mean sea level and seasideorth- — Key, FL, has an offset of -0.8 feet. When 

‘ometric heights. heights of sea level are important for an 
analysis, projects should reference the near- 
est local sea level datum. 
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Map Projections and Coordinate Systems 


Datums tell us the latitudes and longi- 
tudes of features on an ellipsoid. Most spa- 
tial analysis are on Cartesian surfaces, so we 
must transfer geographic coordinates from 
the curved ellipsoid to a flat map. We do this 
via a map projection, a systematic rendering 
from the curved Earth to a flat map. 

Nearly all projections are applied via 
exact or iterated mathematical formulas that 
convert geographic latitude/longitudes pro- 
jected X/Ys (or eastings/northings). Figure 
3-26 shows one of the simpler projection 
‘equations, between Mercator and; 
coordinates, assuming a spherical Earth 
‘These equations would be applied for every 


Conversion from geographic 
(lon, lat) fo projected coordinates 


Given longitude = 2, latitude = ф 
(all angles in radions) 


Mercator projection coordinates 
ore: 


xa RH. ho) 
y * Rn (tan (x/4 + 9/2) 


меге R is the radius of the sphere 
at map scole (e.g. Earth's radius), 
їп is the natural log function, and 
^о is the longitudinal origin (Green- 
wich meridian) 


Inverse equation, from x, y to 
xe 
Г 
= (8/2) - 2 ton He” 
Figure 3-26: Fouls are known for most 
чан би provi сн projected sost- 
nes f he мшез мы 
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defining the Mercator projection fora sphere. 
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point, vertex, node, or grid cell in a data set, 
converting the vector or raster data feature. 
by feature from geographic to Mercator. 
coordinates. 


Notice that there are parameters we 
must specify for this projection — here R, the 
Earth's radius, and A the longitudinal ori- 
gin Different values for these parameters 
five different values for the coordinates, so 
even though we may have the same kind of 

transverse Mercator), we have. 
different versions each time we specify dif- 
ferent parameters. 

Projection equations must also be speci- 
fied in the “backward” direction, from pro- 
jected coordinates to geographic 
Coordinates. The projection coordinates in 
this backward, or “inverse,” direction are 
‘often much more complicated than the for- 
‘ward direction, but are specified for every 
commonly used projection. 


Most projection equations are much 
‘more complicated than the transverse Mer- 
cator, in part because most adopt an ellipsoi- 
dal Earth, and because the projections are 
initially омо simply curved surfaces, and 
then on to а plane. Thankfully, projection. 
equations have long been standardized, doc- 
‘umented, and made widely available through 
proven programming libraries and projection. 
calculators. 

Note that each projection defines a Car- 
tesian coordinate system and hence creates 
grid north a third version of the northern 
direction, in addition to geographic and 
‘magnetic попі. Grid north is the direction. 
ofthe Y axis in a map projection, and often 
‘equals or nearly equals the direction of a 
‘geographic meridian near the center of the 
projected area. Grid north is typically differ- 
ent from geographic and magnetic north for 
most of tbe projected region. 

Most map projections may be viewed as 
sending rays of light from a projection 
source through the ellipsoid and onto a map. 
surface (Figure 3-27). In some projections, 
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map 


ellipsoid 


Figure 3-27: A conceptual view ofa map projec- 


the source is not a single point; however, the 
basic process involves the systematic trans- 
fer of points from the curved ellipsoidal sur- 
face to a flat map surface. 


Distortions are unavoidable when mak- 
ing flat maps because of the transition froma 
complexly curved Earth surface to a flat ог 
simply curved map surface. Portions of the 
rendered Earth surface must be compressed 
or stretched to fit onto the map. This is illus- 
trated in Figure 3-28, a side view of a projec- 
ton from an elipsoid omo pae The тар 
surface intersects the Earth at two locations, 
T, and I, Points toward the edge of the map 
surface, such as D and E. are stretched арап. 
The scaled map distance between d and e is 
greater than the distance from D to E mea- 
sured on the surface of the Earth. More sim- 
ply put, the distance along the map plane is 
greater than the corresponding distance 
along the curved Earth surface. Conversely, 
points such as A and B that lie in between T, 
and I, would appear compressed together. 
The scaled map distance from o to b would 
be less than the surface measured distance 
from A to B. Distortions at I, and Т, are zero. 


Figure 3-28 demonstrates a few import- 
ant facts. First, distortion may differ in sense 
across the map. Parts of the map may have 
‘compressed areas or distances relative to the. 
scaled Earth's surface measurements, while 
other pars may have expanded areas or dis- 
tances. Second, there are often a few points 
‘or lines where distortions are zero and where 
length. direction, or some other geometric 
property is preserved. Finally. distortion is 
usually small near the points or lines of 
intersection, and increases with increasing 
distance from the points or lines of intersec- 
tion. 


Different map projections may distort 
the globe in different ways. The projection 
source, represented by the point atthe mid- 
dle of the circle in Figure 3-28, may change 
locations. We may project on to different 


globe. If we change any of these three fac- 
tors, we will change how or where our map 
is distorted. The type and amount of projec- 
tion distortion may guide Ше selection of the 
appropriate projection ог limit the area pro- 
jected. 


Figure 3-29 shows an example of distor- 
tion with a projection onto a planar surface, 
but from above rather than the side view in 


ellipsoid 
c Usi face. 


Projection 
“ght” source 
distance & < AB 
distance de > DE 


3.28: Distortion during map projection. 
Tis ойе ew shows boh expansion sd som- 
presson of areas on a planar map. 
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Figure 3-28. This planar surface intersects 
the globe at a line of true scale, the solid cir- 
cle shown in Figure 3-29. Distortion 
increases away from the line of tue scale, 
‘with features inside the circle compressed or 
reduced in size while features outside the 
standard circle are Calculations. 
show a scale error of -1% near the center of 
the circle, and increasing scale error in con- 
centric bands outside the circle to over 2% 
near the outer edges of the projected area 
‘An approximation of the distance distor- 
tion may be obtained for any projection by 
comparing grid coordinate distances to great 
circle distances. A great circle distance is 
defined on the surface ofthe spheroid or 
ellipsoid (Figure 3-30) The circle distance is 
the shortest path between two points on the. 


surface of the ellipsoid, and by approxima- 
tion, Earth. 

Figure 3-30 illustrates the calculation of 
both the great circle and projection, or Carte- 
sian distances for two points in Ше southern 
U.S., using the spherical approximation for- 
mula introduced in Chapter 2. We use a 
spherical approximation of the Earth's shape 
because it is accurate enough for illustration. 
The difference between this simpler spheroi- 
‘alee equi peas ерде И di 
and an ellipsoidal method is typically much. 
less than 0.156, and always less than 0.3%, 
so typically less than SO cm (1.5 feet) in our 
example, 

Projected (Cartesian) coordinates in this 
example are in the UTM Zone 15N coordi- 
nate system, and derived from the appropri- 
ate coordinate transformation equations, 
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Armed with the coordinates for both pairs of 
points in both the geographic and projected 
coordinates, we can calculate the distance in 
the two systems, and subtract to find the 
length distortion due to projecting from the 
spherical surface to a flat surface. 

Note that web-based or other software 
may report incorrect distances, because they 
may use a spheroidal approximation, use dif- 
ferent Earth radii, or may use computation- 
ally efficient but more approximate formulas 
ог algorithms, so it is best to calculate the 
values from the original formulas to ensure 
accurate results. 


Great Circle vs. Projected Distance 


Spherical Approximation 


А straight line between two points 
shown on a projected map is usually not a 
straight line nor the shortest path when trav- 
eling on the Earth's surface. Conversely, the 
shortest distance between points on the Earth 
surface is likely to appear as a curved line on 
а projected map. The distortion is impercep- 
tible for large scale maps and over short dis- 
tances, but exists for most lines. 

Figure 3.31 illustrates straight line dis- 
tortion. This figure shows the shortest dis- 
tance path (the great circle) between Seattle, 
USA, and Paris, France. Paris lies almost. 
due east of Seattle, but the shortest path 
traces a route north of an east-west line. This 
shortest path is distorted and appears curved 


great circle 


Using the great circle formula from our 
example in Chapter 2, 

A with latitude, longitude of (ф,. a), and 
B, with latitude, longitude of (Qu. Ag) 


Фа ае 


24 


The great circle distance from point А to point B is given by the formula: 
А corresponding to Baton Rouge, LA = 30.4877456°, -91.1693348° 
B corresponding to Houston, Texas = 29.7507171°, -95.370003° 


d = 6,3782 sin? (iro 358514))«cos(30 4877456) cos(26 7507171) ir (2 1003341)] 
= 412.681 km 

Grid distance (UTM Zone 15N coordinates): 

Grid coordinates of Baton Rouge, LA = 675,7082, 3,374,258.0 

Grid coordinates of Houston, Texas = 270,816.1, 3,293,516.3 

dg = [(XA - Xp) + (YA - Yg) 
= [(675,7082 - 270,816.) + (3,374,258.0 - 32903,5163 P5 
= 412.864 km 

distortion is 412.681 - 412.864 = -0.183 km, or a 183 meter lengthening 
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Figure a: Curved 


by the Plate Carrée projection commonly 
used for global maps. 

Projections may also substantially dis- 
tort the shape and area of polygons. Figure 
3-32 shows various projections for Green- 
land, from an approximately "unprojected" 
view from space through geographic coordi- 
nates cast on a plane, to Mercator and trans- 
‘verse Mercator projections. Note the 
changes in size and shape of the polygon 
depicting Greenland. 

Most map projections are based on а 
developable surface, a geometric shape onto. 
‘which the Earth's surface is projected. 
Cones, cylinders. and planes ме the most 
‘common developable surfaces. A plane is 
already flat, and cones and cylinders may be 
‘mathematically "cut" and “unrolled” to 
develop a flat surface (Figure 3-33). Projec- 
tions may be characterized according tothe 
shape of the developable surface, as conic 
(cone), cylindrical (cylinder), and azimuthal 
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(plane). The orientation of the developable 
surface may also change among projections; 

the axis of a cylinder may coin- 
ie widi pols (equi he al amy 
pass through the Equator (transverse), or be 
at an angle (oblique). 

Note that while the most common map 
projections used for spatial data are based on 
a developable surface, many map projec- 
tions are not. Projections with names such as 

Mollweide, sinusoidal, 
and Goode homolosine are examples, These 
projections often specify a direct mathemati- 
cal projection from an ellipsoid onto а flat 
surface. They use mathematical forms not 
related to cones, cylinders, planes, or other 
three-dimensional figures, and may change 
the projection surface for different parts of 
the globe, but generally are used only for 
display, and not for spatial analysis, because 
the coordinate systems are not strictly Carte- 
sian. 
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Common Map Projections in GIS 


‘There are hundreds of map projections 
used throughout the world: however, most 
spatial data in GIS are specified using a rela- 
tively small number of projection types. 


‘The Lambert conformal conic and the 
transverse Mercator are among the most 
common projection types used for spatial 
data in North America, and much of the. 
‘world (Figure 3-34). Standard sets of projec- 
tions have been established from these two 
basic types. The Lambert conformal conic 
(LCC) projection may be conceptualized as 
а cone intersecting the surface of the Earth, 
with points on the Earth's surface projected 
‘onto the cone. The cone in the Lambert con- 
formal conic intersects the ellipsoid along. 
two arcs, typically parallels of latitude, as 
shown in Figure 3-34 (top left). These lines 
intersection are known as standard paral- 

Distortion in a Lambert conformal conic 
projection is smallest near the standard par- 
allels, where the developable surface inter- 
sects the Earth. Distortion increases in a 
complex fashion as distance from these par- 
айе increases. Distortion is illustrated at 
"he top right and bottom of Figure 3-34. Cir- 
cles ofa constant S-degree radius are drawn 
‘on the projected surface at the top right, and 
approximate lines of constant distortion and 
а line of true scale аге shown in Figure 3-34, 
bottom. Distortion decreases toward the 
standard parallels, and increases away from 
these lines. Distortions can be quite severe, 

illustrated by the apparent expansion of 
them South America. This property of a 
low-distortion band running in an east-west 
direction between the standard parallels 
‘makes the Lambert conformal conic projec- 
tion common for mapping areas that are 
larger in an east-west direction. 

Distortion is controlled by the place- 
‘ment ofthe standard parallels. The example 
in Figure 3-34 shows parallels placed such 
that there is a maximum distortion of 
approximately 1% midway between the 
standard parallels. We reduce this distortion 
by moving the parallels closer together. but 


at the expense of increasing distortion out- 
side the zone between the lines. 

The transverse Mercator is another com- 
‘mon map projection. This map projection 
may be conceptualized as enveloping the 
Earth in a horizontal cylinder, and projecting 
the Earth's surface onto the cylinder (Figure 
3-35), The cylinder in the transverse Merca- 
tor commonly intersects the Earth ellipsoid 
along a single north-south tangent, or along 
two secant lines, noted as the lines of true 
Scale in Figure 3-35. A line parallel to and 
midway between the secant is often called 
the central meridian. The central meridian 
extends north and south through transverse 
Mercator projections. 


As with the Lambert conformal conic, 
the transverse Mercator projection has a 
band of low distortion, but this band runs in. 
а north-south direction. Distortion is least 
near the line(s) of intersection. The graph at 
the top right of Figure 3-35 shows a trans- 
verse Mercator projection with the central 
meridian (line of intersection) at W96°. Dis- 
tortion increases markedly with distance east 
‘or west away from the intersection line; for 
‘example, the shape of South America is 
severely distorted in the top right of Figure 
3-35. The drawing at the bottom of this same 
figure shows lines estimating approximately 
equal scale distortion for a transverse Merca- 
tor projection centered on the USA. Notice 
"hat the distortion increases as distance from 
the two lines of intersection increases. Scale 
distortion error may be maintained below 
any threshold by ensuring the mapped area 
close to these two secant lines intersecting 
the globe. Transverse Mercator projections 
are often used for areas that extend in a 
north-south direction, as there is little added 
distortion extending in that direction. 
Different projection parameters may be 
used to specify an appropriate coordinate 
system for a region of interest. Specific stan- 
dard parallels or central meridians are cho- 
sen to minimize distortion over a mapping 
area. An origin location, measurement units, 
x and y (or northing and easting) offsets, a 
scale factor, and other parameters may also 
be required to define a specific projection 
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Lambert Conformal Conic Projection 
cone-ellipsoid 
intersection. 


‘standard porollej 
line of true scale, 


Transverse Mercator Projection 


Cyinder-eipsod intersection 
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The State Plane Coordinate Sys- 
tem 


The State Plane Coordinate System 
(SPCS) isa standard set of projections for 
the United States. The SPCS specifies posi- 
tions in Cartesian coordinate systems for 
each state. There are one or more zones in 
most states, with slightly different projection 
parameters in each State Plane zone (Figure 
3-36. Multiple State Plane zones are used to 
limit distortion errors due to map projec- 
tions. 

State Plane systems ease surveying, 
mapping, and spatial data development ш a 
GIS, particularly when whole counties ог 
larger areas are covered. The State Plane 
system provides a common coordinate refer- 
ence for horizontal coordinates over county 
to multi-county areas while limiting distor- 
tion error to specified maximum values. 
Most states have adopted zones such that 
projection distortions are kept below one 
рап in 10,000. Some states allow larger dis- 
tortions (e.g. Montana, Nebraska) for the 
sake of having only one State Plane zone. 
SPCSS are used in many types of work. 
including property surveys, property subdi- 
visions, large-scale construction projects, 


and photogrammetric mapping, and the. 
zones and SPCSs are often adopted for GIS. 

‘One State Plane projection zone may 
suffice for small states. Larger states com- 
monly require several zones, each with a dif- 
ferent projection, for each of several 
geographic zones of the state. For example, 
Delaware has one State Plane coordinate 
zone, while California bas six, and Alaska 
has 10 State Plane coordinate zones, each 
Corresponding to a different projection 
within the state. Zones are added toa state to 

‘ensure acceptable projection distortion. 
within ай zenes Figure 337 37, left). Zone. 
boundaries are defined by county. parish, ог 
other municipal boundaries. For example. 
the Minnesota south/central zone boundary 
‘uns approximately east-west through the 
state along defined county boundaries (Fig- 
ше 3:37, left. 

Most State Plane coordinate systems are 
based on one of two types of map projec- 
tions: the Lambert conformal conic or the 
transverse Mercator projections, Because 
distortion in a transverse Mercator increases 
with distance from the central meridian, this 
projection type is most often used with states 
‘or zones that have a long north-south axis 
(eg, Illinois or New Hampshire). Con- 
versely, a Lambert conformal conic projec- 
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tion is most often used when the long axis of 
a state or zone із in the east-west direction 
(examples are North Carolina and Virginia). 


Standard parallels for the Lambert con- 
formal conic projection, described earlier, 
are specified for each State Plane zone. 
These parallels are placed at one-sixth of the 
Zone width from the north and south limits 
of the zone (Figure 3-37, right). A zone cen- 
‘ral meridian is specified at a longitude near 
"he zone center. This central meridian points 
at grid north; however, ай other meridians 
converge to this central meridian, so they do 
not point to grid north. The Lambert confor- 
mal conic is used for State Plane zones for 
31 states. 


As noted earlier, the transverse Merca- 
tor specifies a central meridian. This central 
‘meridian defines grid north in the projection. 
A line along the central meridian points to 
geographic and grid north, and specifies the 
‘Cartesian grid direction for ће map projec- 
tion. All parallels of latitude and all meridi- 
ans except the central meridian are curved 
for a transverse Mercator projection, and 
hence these lines do not parallel the grid x ог 
y directions. The transverse Mercator is used 
for 22 State Plane systems (the sum of states 
is greater than 50 because both the trans- 
verse Mercator and Lambert conformal 
сопіс are used in some states, e.g., Florida). 


Finally. note that more than one version 
of the State Plane coordinate system has 


been defined. The State Plane system was 
first introduced with NAD27, and modified 
for NADS such that for many states, the 
coordinates are different for any point. There 
will be a new State Plane system withthe 
introduction of the NATRF 2022. 

‘Conversion among State Plane projec- 
tions may be further confused by the various 
definitions used to translate from feet to 
meters. Geodesy measurements are based on 
the metric system, while for a long period of 
time, survey measurements in the United 
‘States used a non-standard conversion of one 
‘meter equal to exactly 39.97 inches. This 
yields a conversion for a U.S. survey foot of: 


1 foot « 03048006096012 meters 


The rest of the world adopted an inter- 
national foot of: 


1 foot = 03048 meters 


The U.S. survey foot is officially aban- 
doned as of December 31, 2022, but many 
older data sets were created using the US. 
survey foot. The slightly longer metric-o- 
foot conversion factor was often the default 
їп many jurisdictions for State Plane coordi- 
nate systems within the United States, and is 
usually specified as sFT in metadata. Users 
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should verify the units for any State Plane 
data set, particularly those that originated 
before 2023. 


Universal Transverse Mercator 
Coordinate System 


The Universal Transverse Mercator 
(UTM) coordinate system is another stan- 
dard, distinct from the State Plane system. 
The UTM is a global coordinate system, 

based on the transverse Mercator projection. 
Itis widely used in the United States and 

other parts of North America, and is also 

used in many other countries. 

‘The UTM system divides the Earth into 
zones that are 6 degrees wide in longitude. 
and extend from 80 degrees south latitude to 
84 degrees north latitude. UTM zones are 
‘numbered from | to 60 in an easterly direc- 
tion, starting at longitude 180 degrees West 
(Figure 3-38). Zones are further split north 
and south of the Equator. Therefore, the zone 
containing most of England is identified as 
UTM Zone 30 North, while the zones con- 
taining most of New Zealand are designated 
UTM Zones 59 South and 60 South. Direc- 
tional designations are here abbreviated, for 
example, 30N in place of 30 North. 


"m " 


Distances in the UTM system are speci- 
fied in meters north and east of a zone origin 
(Figure 3-39). The y values are known as 
UTM northings, and increase in a northerly 
direction. The x values are referred to as 
UTM eastings and increase in an easterly 
direction. 

‘The origins of the UTM coordinate sys- 
tem are defined differently depending on 
whether the zone is north or south of the. 
Equator. In either case, the UTM coordinate 
system is defined so that all coordinates are. 
positive within the zone. Zone easting coor- 
dinates are all greater than zero because the 
‘central meridian for each zone is assigned an 
casting value of 500,000 meters. This effec- 
tively places the origin (E = O) at a point 
500.000 meters west of the central meridian. 
All zones are les than 1,000,000 meters 
wide, ensuring that all eastings will be posi- 
tive. 


‘The Equator is used as the northing ori- 
gin for ай north zones (Figure 3-39, left), 
Thus, the Equator is assigned a northing 
value of zero for north zones, This avoids 
negative coordinates, because all of the 
UTM north zones are defined to be north of 
the Equator 
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Universal Transverse Mercator zones 
south of the Equator are slightly different 
"han those north of the Equator (Figure 3-39. 
right). South zones have a false nordhing 
value added to ensure all coordinates within 
one are positive, UTM coordinate values 
ease as one moves from south to north in 
a projection area. Ifthe origin were placed at 
the Equator with а value of zero for south 
zone coordinate systems, then all the north- 
ing values would be negative. An offset is 
applied by assigning a false northing. a non- 
zero value, to an origin or other appropriate 
location. For UTM south zones, the nothing. 


ones to maintain all coordinate se postre umber 


values at the Equator are set to equal 
10,000,000 meters. Because the distance 
from the Equator to the most southerly point 
ina UTM south zone is less that 10,000,000 
meters, this assures that all northing coordi- 
nate values will be postive within each 
UTM south zone (Figure 3-39). 

The UTM coordinate system is common 
for data and study areas spanning large 
regions, for example, several State Plane 
zones. Мапу data from US. federal govern- 
‘ment sources are in a UTM coordinate sys- 
tem because many agencies manage large 
areas. Many state government agencies in 
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fits predominantly or entirely into one UTM 
zone. 


As noted before, all data for an analysis 
area must be in the same coordinate system 
if they are to be analyzed together. If not, the 
data will not co-occur as they should. The 
раена اانا‎ Dd 
dates analyses, and many 
‘as naona reser maki сопу 
agencies have adopted the dominant UTM 
fl predominantly aen wii zoe 

ог пое 
often adopt a UTM zone for much statewide 
data, e.g.. Utah and UTM zone 12 (Figure 3- 
40). 

We must noe that the UTM coordinate 
system is not always compatible with 
regional analyses, Because coordinate val- 
ues are discontinuous across UTM zone 
boundaries, analyses are difficult across 
these boundaries. UTM zone 15 isa different 
coordinate system than UTM zone 16. The 
state of Wisconsin approximately straddles 


these two zones, and the state of Georgia 
straddles zones 16 and 17. If a uniform, 
Statewide coordinate system is required, the 
‘choice of zone is not clear, and either one or 
the other of these zones must be used, or 
‘some compromise projection must be cho- 
sea. For example, statewide analyses in 
Georgia and in Wisconsin are ofien con- 
ducted using UTM-like systems that involve 
moving the central meridian to near the cen- 
ter of each state. 


Ger conmon rdc 


map projections that are often 
slr senna у 
analysis. These maps are often 
‘cover very large areas, and may 
severely reshape, or hide parts o thot, 
Directions, distances, and areas are typically 
not measured or computed in these projec- 
tions, as distortions are too great. Common 
‘global projections include variants ofthe 
Mercator, Goode, Mollweide, and Miller. 
projections There is a trade-off that must be 


Figure 3-40: UTM zones for the lower 48 contiguous states of the United States of Ameria. Each UTM. 
dote is 6 degrees wide АП zones inte Nother Hensaghere ae goth zones eg. Zone 10 North. 
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made in global projections, between a con- 
tinuous map surface and distortion. 
Distortion in world maps may be 
rede by wing a cut огїшетией sr 
face. Different projection parameters or sur- 
faces may be specified for different pars of 
the globe, Projections may be mathemati- 
cally constrained to be continuous cross the 
area mapped. 
Figure 3-41 illustrates an interrupted 
projection in the form of a Goode bomolos- 
T This projection н besed on sisal 
projection and a Mollveide projection. 
‘These two projection types are merged at 
parallels of identical scale. The parallel of 
identical cale in this is set near the 
mid-norther latitude of 44° 40 N, 


Conversion Among Coordinate 
Systems 

Conversion from one projected coordi- 
nate system to another requires using the 
inverse and forward projection equations, 
described in an earlier section. passing 
through the geographic coordinate set. This 
allows a flexible conversion between any 
wo projections, given our requirement that 
both the forward and inverse, or “back- 
ward,” projection equations are specified for 
any map projection. For example, given a 
coordinate pair in the State Plane system, 
‘you may calculate the corresponding geo- 
graphic coordinates. You may then apply а 


formula that converts geographic coordi- 
nates to UTM coordinates for a specific zone 
using another set of equations. Since the 
backward and forward projections from geo- 
graphic to projected coordinate systems are 
known. we may convert among most coordi- 
nate systems by passing through а geo- 
graphic system (Figure 3-420), 

Care must be taken when converting 
among projections that use different datums. 
appropriate, we must insert a datum trans- 
formation when converting from one pro- 
jected coordinate system to another (Figure. 
3-420). A datum transformation, described 
earlier in this chapter, is a calculation of the 
change in geographic coordinates when 
‘moving from one datum to another. 

Users of GIS software should be careful 
‘when applying coordinate projection tools 
because tbe datum transformation may be 
‘omitted, or an inappropriate datum manually 
or automatically selected. For some soft- 
‘ware, the projection tool does not check or 
maintain information on the datum of the. 
input spatial layer. This will often lead to an 

ог no datum transformation, 
and the output from the projection will be in 
error. Often these errors are small relative to 
other errors, for example, spatial imprecision 
їп the collection of the line or point features. 
‘As shown in Figure 3-13, errors between 
МАЗА 1986) and NADS3(CORS96) may 
be less than 10 cm (4 inches) in some 
regions, often much less than the average 
spatial error ofthe data themselves. How- 
ever, errors due to ignoring the datum trans- 
formation may be quite large, for example, 
ens to hundreds of meters between NAD27 
and most versions of NADS3, and errors of 
‘up to a meter are common between recent 
versions of WGS84/ITRF and NADS3. 
Given the sub-meter accuracy of many new 
GPS and other GNSS receivers used in data 
collection, datum transformation error of 
‘one meter is significant As data collection 
accuracy improves, users develop applica- 
tions based on those accuracies, so datum 
transformation errors should be avoided in 
all cases. 
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Coordinate System Identifiers 


Most commonly used, established pro- 
jections have been assigned standard identi- 
fiers. The EPSG IDs are most common, 
assigned and maintained by the European. 
Petroleum Survey Group (EPSG). Petroleum 
engineers were among the first that needed 
to share completely defined, unambiguous, 
definitions of coordinate systems across tbe. 
globe, with all projection parameters and 
datum in standard formats. They have since 
joined the Intemational Association of Oil 
and Gas Producers, with a geomatics com- 
mittee that maintains and publishes coordi- 
nate system standards, along with an online 
registry, currently at haps:/epsg org? 

home him. Software vendors typically use. 


this site as the source for their projection 
definitions. 

At about the same time, the National 
‘Geographic Institute of France, GNF, devel- 
‘oped a set of standard coordinate systems 
and identifiers for projections, many not 
used or specified by the EPSG at that time, 
‘These are sometimes specified with data, 

the EPSG are more commonly used 
‘when both specifications exist. Finally, 
ESRI. а large GIS software company, has 
defined codes for commonly-used projec- 
tions that are not specified either in the 
EPSG or IGNF. 

Coordinate system identifiers are often 
displayed when selecting coordinate systems 
in software, or on the main map screen, and 
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Figure 3-43: Example of software | coor 
ma 


in documentation describing data sets (Fig- 
ure 3-43), The EPSG or other identifiers are 
often used as a shorthand description ofthe 

full projection, and reference to them may be 
confusing if one is unaware of the system of 
standard identifiers. 


‘The Public Land Survey System 


For the benefit of GIS practitioners in 
the United States, we must cover one final 
land designation system, known as the Pub- 
lic Land Survey System. or PLSS. The PLSS 
is not a coordinate system, but PLSS points 
are often used as reference points in the 
United States, зо the PLSS should be well 
‘understood for work there. 

‘The PLSS divided lands by north-south 
lines, 6 miles арап, running parallel to a 
principal East-west lines were sur- 
‘veyed perpendicular to these north-south 
lines, also at six mile intervals. These lines 
form square townships. Each township was 
further subdivided into 36 sections, each 
section approximately a mile on а side. Each. 
section was subdivided further, to quarter- 
sections (one-half mile on a side) or six- 
teenth sections (one-quarter mile on a side). 
Sections were numbered in а zigzag pattem 
from one to 36, beginning in the northeast 
corner (Figure 3-44, squares numbered 1 to 
36 within the thicker-boundary square). 

Because the primary purpose of the 
PLSS survey was to identify parcels. lines 
and comer locations were considered static 


оп completion of tbe survey, even if the cor- 
mers were far from their intended location 
Survey errors were inevitable given the large 
areas and number of different survey parties 
involved. Rather than invite endless dispute 
and readjustment, the PLSS specifies that 
boundaries established by the appointed 
PLSS surveyors are unchangeable, and that 
Township and section comers must be 
accepted as tre. The typical section con- 
tains approximately 640 acres, but due in 
part to errors in surveying, sections larger 
han 1200 acres and smaller than 20 acres 
жеге also established (Figure 3-44). 


The PLSS is a standardized method for 
designating and describing the location of 
land parcels, meant to addres the shortcom- 
ings of metes and bounds surveying, the 
‘most common prior method. Metes and 
bounds describe a parcel relative to features 
оп the landscape, sometimes supplemented 
‘with angle or distance measurements, Under. 
metes and bounds a parcel description was 
often ambiguous. PLSS replaced metes and 
‘bounds ш the early 18005, used for nearly all 
land outside the original thirteen colonies. 
Ang ‘uniform grid system was. 
established across the landscape, with peri- 
ойс adjustments incorporated o account for. 
the anticipated error. Parcels were desig- 
mated by their location within this grid sys- 
tem. 


The PLSS is important today for several 
reasons. First, since PLSS lines are often 
property boundaries. they form natural corri- 
dors in which to place roads, powerlines, and 
other public services: they are ойеп evident. 
оп the landscape (Figure 3-45). Many road 
intersections occur at PLSS comer points, 
and these can be viewed and referenced on 
many maps or imagery used for GIS data- 
base development efforts, Thus, the PLSS 
often forms a convenient system to coregis- 
er GIS data layers. PLSS comers and lines 
are often plotted on government maps (@.2.. 
1:24,000 quads) ог available as digital daa. 


Chapter 3: Geodesy, Projections, and Coordinate Systems 133 


ЧЕНЕ ЕЕЕ ЕЕЕ ЕЕЕ 


24S PLS ines ae lo de ee Red (уи neon ge, ahve) 
follow the section and township lines (above right). = R 


134 GIS Fundamentals 


when a GIS is to be developed (Figure 3-46). 
These points may be useful to properly 
locate and orient spatial data layers on the 
Earth's surface. 
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‘Summary 


In order to enter coordinates in a GIS, 
we need to uniquely define the location of all 
points on Earth. We must develop a refer- 
ence frame for our coordinate system, and 
locate positions on this system. Since the 
Earth is a curved surface and we work with 
flat maps, we must somehow reconcile these 
two views of the world. We define positions. 
оп the globe via geodesy and surveying. We 
convert these locations to flat surfaces via 
map projections. 


We begin by modeling the Earth's shape 
with an ellipsoid, An ellipsoid differs from 
the geoid, a gravitationally defined Earth 
surface, and these differences caused some 
early confusion in the adoption of standard 
global ellipsoids. There isa long history of 
ellipsoidal measurement, and we have 
arrived at our best estimates of global and 
regional ellipsoids after collecting large, 
painstakingly developed sets of precise sur- 
face and astronomical measurements. These 
measurements are combined into datums. 
and these datums are used to specify the 
coordinate locations of points on the surface 
ofthe Earth, 

Map projections are a systematic render- 
ing of points from the curved Earth surface 
ошоа flat map surface. While there are 
‘many purely mathematical or purely empiri- 
cal map projections, the most common шар 
projections used in GIS are based on devel- 
opable surfaces. Cones, cylinders, and 
planes are the most common developable 
surfaces. A map projection is constructed by 


passing rays from a projection center 
through both the Earth surface and the devel- 
‘opable surface. Points on the Earth are pro- 
jected along the rays and onto the 
‘developable surface. This surface is then 
mathematically unrolled to form a flat map. 


Standard sets of projections are com- 
monly used for spatial data in a GIS, In the 
United States, the UTM and State Plane 
coordinate systems define a standard set of 
map projections that are widely used. Other 
map projections are commonly used for con- 
tinental or global maps, and for smaller 
‘maps in other regions ofthe world, 

‘A datum transformation is often 
required when performing map projections, 
Datum transformations account for differ- 
‘ences in geographic coordinates due to 
‘changes in the shape or origin of the spher- 
‘id, and in some cases to datum adjustments 
Datum transformation should be applied as a 
step in the map projection process when 
input and output datums differ 

A system of land division known as the 
Public Land Survey System (PLSS) was 
established in the United States. This is not a 
coordinate system, but rather a method for 
‘unambiguously and systematically defining 
parcels of land based on regularly spaced 
Survey lines in approximately north-south 
and east-west directions. Intersection coordi- 
nates have been precisely measured for 
many of these survey lines, and are often 
used asa reference grid for further surveys 
or land subdivision. 


136 GIS Fundamentals 


Suggested Reading 


Bossler, J.D. (2002). Datums and geodetic systems, In J, Bossler (Ей), Manual of 
Geospatial Technology. London: Taylor and Francis. 


Brandenburger, AJ., Gosh, S.K. (1985). The world’s topographic and cadastral map- 
ping operations. Photogrammetric Engineering and Remote Sensing, 51:437-444. 

Burkholder, E-F. (1993). Computation of horizontal'evel distances. Journal of Sur- 
veying Engineering, 117:104-119. 


Colvocoresses, A.P. (1997). The gridded map. Photogrammetric Engineering and 
Remote Sensing, 63:371-376. 


Dennis, M.L. (2018). The State Plane Coordinate System: History, Policy, and Future 
Directions. NOAA Special Publication NOS NGS 13, National Geodetic Survey, 
Washington D.C. 


Doyle, FJ. (1997). Map conversion and the UTM Grid. Photogrammetric Engineer- 
ing and Remote Sensing, 63:367-370. 


Elithorpe, А.) Findorff,D.D. (2009). Geodesy for Geomatics and GIS Profession- 
als. Acton: Copley Custom Textbooks. 


Featherstone, W.E.. Kuhn, М. (2006). Height systems and vertical datums: a review 
in the Australian context. Journal of Spatial Science, $1:21-41. 


Flacke, W., Kraus, B. (2005). Working with Projections and Datum Transformations 
in ArcGIS: Theory and Practical Examples. Norden: Points Verlag. 


Habib, A. (2002). Coordinate transformation. In J. Bossler (Ed.). Manual of Geospa- 
tial Technology. London: Taylor and Francis. 


Iliffe, 1.C., Lott, К. (2008). Datums and Map Projections for Remote Sensing, GIS, 
апа Surveying. 2nd ed. Boca Raton: CRC Press. 


Intemational Association of Oil and Gas Producers (2016). Coordinate Conversion 
‘and Transformations including Formulas. Geomatics Guidance Note Number 7, 
Part 12. wwewepsg.org. 


Janssen, V. (2009). Understanding coordinate reference systems, datums, and trans- 
formations. International Journal of Geoinformatics, 5:41-53. 


Keay, J. (2000). The Great Arc. New York: Harper Collins. 


Chapter 3: Geodesy, Projections, and Coordinate Systems 137 


Leick, A. (1993). Accuracy standards for modem three-dimensional geodetic net- 
works. Surveying and Land Information Systems, $3:111-127. 


Maling. Р.Н. (1992). Coordinate Systems and Map Projections. London: George 
Phillip 

Meyer, T.H. Roman, D.H., Zilkoski, D.B. (2006). What does height really mean? 
Part Ш: Height systems. Surveying and Land Information Systems, 66:149-160, 


Milbert, D. (2008). An analysis of ће NADS3(NSRS2007) National Readjustment. 
Downloaded 9/12/2011 from hitp://www.ngs.noaa.gow/PUBS_LIB/NSRS2007 


National Geospatial-Intelligence Agency (NGA), TR83S0.2 World Geodetic System. 
1984, Its Definition and Relationship with Local Geodetic Systems. hitp:/earth- 
info.nga.mil/GandG/publications/r$350.2/18350 2.html. 


NOAA Manual NOS NGS 5. State Plane Coordinate System of 1983. hitp:// 
‘www.ngs.noaa.gow/PUBS_LIB/ManualNOSNGSS pdf 


Schuh, H., Behrend, B. (2012). VLBI: A fascinating technique for geodesy and 
astronomy. Journal of Geodynamics, 61:68-80. 


Schwartz, C.R. (1989). North American Datum of 1983, NOAA Professional Paper 
NOS 2. Rockville: National Geodetic Survey. 


‘Smith, D.S., Roman, D., Hilla, S. (2017). NOAA Technical Report NOS NG62. 
‘National Oceanic and Atmospheric Administration. 


‘Smith, J. (1997). Introduction to Geodesy: The History and Concepts of Modern. 
Geodesy, New York: Wiley. 


Stay, R.A., Soler, Т. (1999). Modern terrestrial reference systems, part 1. Profes- 
sional Surveyor, 19:32-33. 


Snay. R.A., Soler, T. (2000). Modern terrestrial reference systems, part 2. The evolu- 
tion of NADS3. Professional Surveyor, 20:16-18. 


‘Snay, R.A., Soler, Т. (2000). Modern terrestrial reference systems, part 3. WGS84 
and ITRS. Professional Surveyor, 20:24-28. 


пау, В.А... Soler, Т. (2000). Modern terrestrial reference systems, part 4. Practical 
Considerations for accurate positioning. Professional Surveyor. 20:32-34. 


Snyder, J.P. (1993). Flattening the Earth: Tivo Thousand Years of Map Projections. 
Chicago: University of Chicago Press. 


138 GIS Fundamentals 
Snyder, LP. (1987). Map Projections, А Working Manual, USGS Professional Paper 
No. 1396. Washington D.C.: United States Government Printing Office. 


Snyder, J.P, Voxland, PM. (1989). An Album of Map Projections, USGS Professional 
Paper Хо. 1453. Washington D.C. United States Government Printing Office. 


Sobel, D. (1995). Longitude. New York: Penguin Books. 
Soler, T., Snay, R A (2004). Transforming positions and velocities between the Inter- 


national Terrestrial Reference Frame of 2000 and the North American Datum of 
1983. Journal of Surveying Engineering, 130-49-55. 


Tobler, WR. (1962). A classification of map projections. Annals of the Association of 
American Geographers, S2:167-175, 


USS. Coast and Geodetic Survey Special Publication 235. The State Coordinate Sys- 
tems. hiip/www:ngs.noaa gov/PUBS_LIB/publication235.pdf 


‘Van Sickle, J. (2010). Basic GIS Coordinates, 2nd Edition. Boca Raton: CRC Press. 


‘Vanicek, Р, Steeves, R.H. (1996). Transformation of coordinates between two hori- 
zontal geodetic datums. Journal of Geodesy. 70:740-745. 


Welch, R., Homsey. А. (1997). Datum shifts for UTM coordinates. Photogrammetric 
Engineering and Remote Sensing, 63:371-376. 


Wolf, PR., Ghilani, C.D. (2002). Elementary Surveying. 10th ed. Upper Saddle 
River: Prentice-Hall. 


Yang, Q.. Snyder, J.P, Tobler, WR. (2000). Map Projection Transformation: Princi- 
ples and Applications. London: Taylor & Francis. 


Zenk, D. (2014). Correct use of NADS% realizations and geoid model. Minnesota 
Surveyor 22:16-18. 


Zilkoski, D., Richards, J., Young, G. (1992). Results of the general adjustment of the 
North American Vertical Datum of 1988. Surveying and Land Information Sys- 
tems, 53:133-149. 


Chapter 3: Geodesy, Projections, and Coordinate Systems 139 


‘Study Questions 


3.1 -Describe how Eratosthenes estimated the circumference of the Earth. What 
value did he obtain? 


3.2 - Assume the Earth is approximately a sphere (not an ellipsoid). Also assume 
you've repeated the measurements of Posidonius, shown in the figure below, What is 
your estimate of the radius of the Earth's sphere given the following distance/angle 
pairs. Note that the distances are given below in meters, and angle in degrees, and 
calculators or spreadsheets may require you enter angles in radians for trigonometric 
functions (1 radian = 57.2957795 degrees): 

a) angle Ө = 1° 18° 45.795587, distance = 146,000 meters 

b) angle = 0° 43° 32.17917", distance = 80,500 meters 

©) angle 0= 0° 3° 15.06032", distance = 6,000 meters 


Canopus 


1) measure 
d.e 
2) radius « 4 
Center of 3 f 
the Earth ка т radius 
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3.3 - Assume the Earth is approximately a sphere (not an ellipsoid). Also assume 
you've repeated the measurements of Posidonius. What is your estimate of the radius 
of the Earth's sphere given the following distance/angle pairs. Note that the distances 
are given in meters, and angle in degrees, and calculators or spreadsheets may require. 
you enter angles in radians for trigonometric functions (1 radian = 57.2957795 
degrees): 

a) angle = 2° 59' 31.33325", distance = 332,000 meters 

b) angle = 9° 127 12.77201", distance = 1,020,708 meters 

angle = 1° 2° 12.155667, distance = 115.200 meters 


34- What is an ellipsoid? How does an ellipse differ from a sphere? What is the 
equation for the fattening factor? 


3.5 - Provide three reasons why there have been various estimates for Earth's ellip- 
oid radii. 


3.6 - Define the geoid. Tell how it differs from the ellipsoid, and from the surface of 
the Earth. Describe how we measure the position ofthe geoid. 


3.7- Define a parallel or meridian in a geographic coordinate system. Describe where 
the zero lines occur. 


3.8 - How does magnetic north differ from the geographic North Pole? 
3.9 - Define a datum. Describe how datums are developed. 


3.10 - Why are there multiple datums, even for the same place on Earth? Define what 
‘we mean when we say there is a datum shift. 


3.11 - What is a triangulation survey, and what is а bench mark? 
3.12 - Why do we not measure vertical heights relative to mean sea level? 
3.13 - What is the difference between an orthometric height and a dynamic height? 
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3.14- Use the NCAT software available from the U.S. NOAA/NGS website (https:// 
‘wwwangs.noaa.gov/NCAT/) to fill the following table. Note that all of these points 
are in the continental United States (CONUS) and longitudes are west, but entered as 


positive numbers. 

NAD27. NADB3(B6) HPGN 
IET рч one | w-— | =т= [УЯ me 
> [wem [roro IEEE 
np юг | «^оооооова)] элг 
z | Wom oc [ее orones. | зелол | rororo 
345- Use the 


NCAT software available from the U.S. NOAA/NGS website (https:// 


‘wwwaigs.noaa gov/NCAT/) to fill the following table. Note that all of these points 
are in the continental United States (CONUS), and longitudes are west, but entered as 
positive numbers. 

NAD27 NADB3(86) HPGN 
SESE =ч 
ВЕ wow БЕЛЕЕ 
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3.16- Use the World Wide Web version or download and start the HTDP software 
from the U.S. NOAA/NGS site (at the time of this writing, hitps:// 

Wwwags.nosa gov/TOOLS/Htdp/Htdp.shtl) and complete the following table. Use 
the tool for a horizontal displacement between two dates. Enter epoch start and stop 
dates of Jantary 1, 1986 and January 1, 2015. respectively. Specify a zero height or z 
for your datum transformation. Use the spherical Earth approximation formul 

described in Chapter 2 when calculating the surface shift distance. pid 
(cm), assuming a radius of 6,371 kilometers. Report the ground shift from 1986 to the. 
2015 time period. 


momom -ises osson -zos Без atr sto om) 
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3.17 - Use the World Wide Web version or download and start the HTDP software 
from the U.S. NOAA/NGS site (at the time of this writing. btps;/ 

www.ngs. noaa. gov/TOOLS/Htdp/Htdp shtml), and complete the following table. Use 
the tool for a horizontal displacement between two dates. Enter epoch start and stop 
dates of January 1, 1986 and January 1, 2015, respectively. Specify a zero height or z 
for your datum transformation, Use the spherical Earth approximation formulas 
described in Chapter 2 when calculating the surface shift distance, in centimeters 
(cm), assuming a radius of 6.371 kilometers. Report the ground distance between 
points from ће 1986 to the 2015 time period. 
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3.18 - Use the VDatum software (available at the time of this writing at https://vda- 
Tum nooa.gov/) to complete the table. Note that all longitudes are west, entered as. 
negative numbers. 


NAD27 NAD83(2011) 
SESI oe [oom | to | ee — [ems] 


3.19- Use the VDatum software (available at the time of this writing at htps:/vda- 
tun: noa. gov/) to complete the table. Note that longitudes are west, entered as nega- 
tive numbers. 


NAD27 NADB3(2011) 


3.20 - Use the VDatum software to calculate the orthometric height change, in centi- 
meters, for the listed NADS3(2011) geographic coordinates, when switching from the 
NAVD(geoidI2A) as the source, to geoids for 2009, 1999, and 1996 respectively.. 
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321 - Use the Уапи software to calculate the orthometric height change, in centi- 
meters for the listed NADS3(201 1) geographic coordinates, when switching from the 
NAVD(geoid1 2A) as the source, to geoids 2009, 1999, and 1996 respectively. 


3,22 -a) You wish to site a seaside hospital in San Diego, CA. Using gauge 9410170, 


report the NAVDSS elevation you wish to use as a threshold if you want the site to be 
at least 30 feet above the mean high water mark (the search function in the upper left 
comer of the tides website, described in ће tides section of this chapter, will help. 
speed the search for the station information. Then look for a tides and water levels, 
datums section). 

b) What is the height difference between the gauge NAVDSS height and the mean low- 
low water mark for the station? 


3:23 а) You wish to dredge a channel near Naples, FL, near the NOAA tidal gauge 
Station 8725110. You wish to maintain a channel depth of 30 feet below the mean sea 
level. What is the channel depth elevation expressed as an NAVDSS height, in feet? 
(the search function in the upper left comer of the tides website, described in the tides 
section of this chapter, will help find the station. Then look for a tides and water lev- 
els, datums section) 


b) What is the mean high-high water mark, expressed in fet, аз a NAVDSS height? 


3.24 - What is a developable surface? What are the most common shapes for a devel- 
‘opable surface? 


3.25 - Look up the NGS control sheets for the following points, and record their hor- 
‘zontal and vertical datums, latitudes, and longitudes: 

DOG, Maine, PID = PD0617 

Key West GSL, Florida, PID -АА1645 

Neah A, Washington, PID ~AF8882 


3.26 - Look up the NGS control sheets for the following points, and record their hor- 
izontal and vertical datums, latitudes, and longitudes: 

Denver, Colorado, PID = KK1544 

Loma East, CA. PID =AC6092 

Austin CE, Texas, PID -DN7664 
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327 - Using the spheroid formula given in this chapter, calculate the great circle dis- 
tance to the nearest kilometer for the control points in question 3.25 - above from: 


- DOG to Neah A 
= Key West to DOG 
~ Neah Ato Key West 


3.28 - Calculate the great circle distance for the control points in question 3.26 - 
above from: 

~ Denver to Loma East 

= Denver to Austin CE 

- Austin CE to Loma East 


39 - Describe the State Plane coordinate system. What type of projections are used 
ina State Plane coordinate system? 


330. Define and describe the Universal Transverse Mercator coordinate system. 
‘What type of developable surface is used with a UTM projection? What are UTM 
zones, where is the origin of a zone, and how are negative coordinates avoided? 


3.31 - What is a datum transformation? How does it differ from a map projection? 


3.32 - Specify which type of map projection you would choose for each country, 
assuming you could use only one map. for the entire country, the projection 
lines of intersection would be optimally placed, and you wanted to minimize overall 
spatial distance distortion for the country. Choose from a transverse Mercator, a Lam- 
bert conformal conic, or an Azimuthal: 

Benin Bhutan 


Slovenia Israel 


3.33 - Specify which type of map projection you would choose for each country, 
‘assuming you could use only one map projection forthe entire country, the projection 
lines of intersection would be optimally placed, and you wanted to minimize overall 
spatial distance distortion for the country. Choose from a transverse Mercator, a Lam- 
bert conformal conic, or an Azimuthal: 

Chile Nepal 

Kyrgyzstan The Gambia 


334 - Describe the Public Land Survey System. Is it a coordinate system? What is its 
‘main purpose? 
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4 Maps, Data Entry, Editing, and 


Output 


Building a GIS Database 


Introduction 


Spatial data entry and editing are fre- 
quent activities for many GIS users. Each 
coordinate pair needed to represent features 
in a GIS must be entered, reviewed, and 
sometimes revised in a GIS database. This is 
often painstakingly slow, even with auto- 
mated techniques, taking significant time for 
many organizations. 

Before digital computers, most spatial 
data were stored on hardcopy maps (Figure 


Figure 41. Maps bave served 1o sore орар оом 
Sem Ep bos ушел mine 


of 


4-1). These are any documents drawn, writ- 
теп. or printed on physical media, including. 
maps and associated tabular data. Most elec- 
tronic data were converted from hardcopy 
sources in the early years of GIS via digitis- 
ing. the process of collecting digital coordi- 
nates from maps. Digitizing is a common. 
data entry method today, now primarily from 
digital images. 

Digital maps are an electronic, graphic. 
depiction of spatial data, now our most com- 


fox at least the past 4000 year. This early шар 
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mon map form (Figure 4-2). Millions of. 
electronic maps are generated each hour. 
‘composed on demand in response to web 
queries, for navigations systems, and for 
commerce, These maps are flexible. easily 
customized, inexpensively distributed, and 
often dynamic. 


Most maps. whether digital or hardcopy. 
contain several components (Figure 4-3).A 
data area or pane occupies the largest part of 
the map, and contains most ofthe depicted 
spatial data. A nearline is often included to 
provide a frame around all map elements, 
and insets may contain additional map ele- 
ments. Sealebars, legends, titles, and other 
graphic elements such as а north arrow are 
often included, All maps have a map scale, 


defined as the ratio of the distance on the 
map to corresponding distance on the 


‘ground, 


media edge. 


eatin 


legend. 


тог arrow 
Figure 43: An example of a map and ite components 


Figure 4.2: An example of commonly pro- 
dosed dal map (courtesy Google) 


|dato pone 


scale bor 


Maps often depict coordinate lines (Fig- 
ure 4-4). When the lines represent constant 
latitude and longitude they are called a grati 
cule (Figure 4-40). These lines may appear 
curved. depending on the map projection 
and scale. Maps may also depict a grid con- 
sisting of lines of constant Cartesian coordi- 
nates. Grid lines are typically drawn in both 
the X and Y directions, and appear straight 
оп most maps (Figure 4-40). Graticules and. 
grids are useful because they provide a refer- 
ence against which location may be quickly 
estimated. 

Aerial and satellite images are often 
included as a base in digital maps, but they 
are not maps, although the line is becoming 
blurred. Uncorrected images usually provide 
a distorted, non-Cartesian rendering of the 
Earth's surface unless they are processed to 
create orthoimages. In addition, images are 
not maps because features of interest such as 

fivers, or mountain ranges are not 


expl 
rich source of. 
standard techniques may be used to extract 
features through manual digitizing, 
described later in this chapter, or 

ңе ciiin дее! а Chap 6 
Outdoor digitizing, usually withthe use of 
Global Navigation Satellite Systems (GNSS) 
such as the U.S. Global Positioning System 
(GPS) is a common source of digital data, 
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and is described in detail in Chapter 5. This 
‘chapter focuses on human-guided coordinate 
capture, primarily from images and maps, 
and on map output 


Accuracy vs. Resolution 


‘We start by contrasting positional accu- 
racy and spatial resolution, as these concepts 
are often confused, to the detriment of digi- 
tal data development. Accuracy is how close 
a value is to the tuth-here, how close an X 
or Y value in our data set is to the true value 
for a feature on the Earth. Formal accuracy 
assessment is covered in Chapter 14, how- 
ever the concept is important enough to 
introduce here. We typically quantify accu- 
racy as some average or range of errors, ер. 
stating that measured average error for light 
pole locations was 2.5 meters. АП data have 
Positional error, typically due to the way we 
Collected or processed the data. 

‘Spatial resolution in a data set or display 
is the smallest feature that is individually 
resolved. We often note the resolution 
master data set as the original collected cell 
size, eg. а 10-meter resolution satellite 
image, or a 6-inch resolution aerial photo- 
graph. The resolution is often not the same 
as the accuracy. Much image data have ап 
average resolution that is smaller than their 
average accuracy. For example, a drone 
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Figure 48 Айна эи пиу have arson bere 


a Ee malin ше ind» lower жеш. 
ferent puto fy еа enter 


jage processing may yield average errors 
over 1 meter — a substantially lower accu- 
тасу than resolution (Figure 4-5) 


Vector dati also have both accuracies 
and resolutions. The accuracy is determined 
by the error associated with point and vertex. 
locations, Accuracy is how close the coordi- 
nate pairs that define features are to the true 
‘values. Resolution is a bit less clear with 
vector data, Boundary-defining vertexes 
may be any distance apart, and so there is no 
опе resolution for a vector data set. There is 
a distribution of intervals between adjacent 
vertices, and the resolution is typically some 
threshold length, e.g., the average distance 
‘between vertices, The resolution for vector 
data may be set by image source materials, 
asit can’t be smaller than image resolution, 
‘but the operator may further degrade resolu- 
tion by sampling at larger Intervals, Reslu- 
tion is also limited by the equipment used 
and operator attentiveness while digitizing. 


Scale 


All displayed spatial data have a scale, a 
relationship between a distance on the screen 
or paper and a corresponding distance on 
Earth. Map scale is often reported as a dis- 
tance conversion, such as one inch toa mile, 
‘meaning one inch on the map equals one 
mile on Earth. They may also be expressed 
эз a unitles ratio, such as 1:24,000. indicat- 
ing a unit distance on a display is equal to 
24.000 units on Earth's surface. Digital ma 
most often use a third method to report scale, 
as a bar or line of known distance, labeled on 
the map (Figure 4-6), although many map- 
ping programs automatically adjust the scale 
according to the display zoom. 

Displaying a scale as a distance conver- 
sion or unitless ratio may be in error on static 
digital document such as a pdf, рер, or png 
file because the scale is altered by zoom 
devel. This changes the magnification on an 
electronic display without the ability to awto- 
‘matically recompute scale. Digital docu- 
ments should most often depict a graphic 
scale bar, embedded in the map. 


The notion of large vs, small scale is 
often confused because scale implies a ratio. 
A larger ratio signifies a large-scale map, so 
з 1:24.000 scale map is considered large- 
scale relative о а 1:100,000 scale шар. 
Many people mistakenly refer to а 1:100,000. 
scale map as larger scale than a 1:24,000 
Scale map because it covers a larger area. It 
helps to remember that features are larger on 
a large-scale map (Figure 4-6), and that 
large-scale maps often show more detail but 
less area. Notice in Figure 4-6, the larger- 
Scale map at the top shows details of Tokyo 
sity. Tokyo shrinks in the successively 
smaller-scale maps, but larger additional 
areas are covered. 

Because maps often report an average 
scale, and because there are upper limits on 
the accuracy with which data can be plotted 
‘on a map, large-scale maps generally have 
less geometric error than small-scale maps. 
‘Small errors in measurement, plotting, and 
hardcopy printing are magnified more on a 
small-scale map than a large-scale map. 
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4.6: Map coverage relative distance, and detail change from larger scales (op) to 
Figure $6 Map cova “bange fom larger scales бор) 
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Map and Data Generalization 


Spatial data and maps are abstractions of 
This abstraction introduces map gen- 
егайсапоп, the unavoidable approximation. 
of rel features when they are represented in 
digital data or on a map. Not ай the geomet- 
ric or attribute detail of the physical world 
are recorded: only the most important char- 
acteristics are included. The set of features 
‘that are most important is subjectively 
defined and will differ among users, but 
many projects aim to produce broadly useful 
data. 

The choice of data sources and digitiz- 
ing methods will unavoidably set limits on 
the size and shape of features that may be 
represented. Consider mapping lakes, based 
оп image data with a 250-meter cell size 
(Figure 4-7, left. The abstraction of the 
shoreline will not represent bays and penin- 
sulas that are smaller than approximately 
250 meters across. Small features will be 
missed, edge detail will be lost, and dis- 
tances along boundaries will depend on the 
resolution of the source image. 

A finer resolution source, such as а 30- 
meter image (Figure 4-7, right), may more 
faithfully depict map detail, but may not be 


an appropriate choice. The finer resolution 
may be more expensive to create, difficult o 
reproduce, unavailable for the entire map- 
ping area, or out of date. Even at higher reso- 
lutions, generalization is unavoidable in data 
collection. One must evaluate any data cre- 
ation effort to ensure that it produces data 
useful for its intended purpose. 

‘Output maps contain all the original data 
generalizations, and may furtber generalize 
if displayed at smaller scales, Cartographers 
often must balance several factors in map 
design, and their choices may lead to fea- 
tues being incompletely represented. 

Many kinds of feature generalization is 
common in data and on maps (Figure 4-8). 
‘We omit detail due to limits on the time, 
methods, or materials available when col- 
lecting geographic data. Limits may also 
apply when compiling, displaying, or 
exporting a map. These feature generaliza- 
tions, depicted in Figure 4-8, may be classed 
as 


Fused: multiple features may be 
grouped to form a larger feature, 


Seife boundary or shape detis 
are lost or "rounded off 


Displaced: features may be offset to 
prevent overlap or to provide a stan- 
dard distance between mapping sym- 
bols, 

Omitted: Small features in a group 

may be excluded from the map, or 

Exaggerated: standard symbol sizes 
are often chosen, for example, standard 
Toad symbol widths, which are much 
larger when scaled than the true road 

width, 


Generalization can bea problem 
because it removes both spatial and attribute 
detail. If tbe main feature of interest in an 
analysis often gets fused into surrounding 
regions, errors of omission may dominate. 
For example, the prairie pothole region of 
Cental North America i dominated by 


Displaced 


a 


Fused 
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‘uplands, interspersed with small wetlands, 
‘many smaller than а few 100 square meters 
in size. These wetlands may comprise a 
‘quarter of the landscape, but are omitted in 
‘coarser resolution data sets, and hence lead 
to erroneous analysis about habitat sutabil- 
ity for breeding birds, or crop production in 
wet years, or likelihood of flooding. 

Generalization is present at some level 
in almost every data layer or map, and 
should be recognized and evaluated for each 
data source ina GIS. If generalization results 
in omission or degradation of data beyond 
acceptable levels, then the analyst should 
Switch to a larger-scale map if appropriate 
‘and available, ог retum to the field or origi- 
nal source materials to collect data at the 
required precision. 


Simplified 
Су 


polygon deta 
simplified 


Exaggerated 


я ‘exoggerated rood 
rood disloced этой polygons ‘rath due io 
from polygons e standard symbol 


Figure 48: Generalizations common in maps and data layers, 
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Digitizing: Coordinate Capture 


Digitizing is the process by which coor- 
dinates are captured шо a data layer in a 
GIS. The point, line, and area coordinate 
values that define the locations and shapes of 
entities must be captured, that i, recorded as 
numbers and structured in the spatial data- 
base, 


Mannal digitization is buman-guided 
coordinate capture from a map or image 
source, The operator guides an electronic 
device over a map or image and signals the 
capture of important coordinates, often by 
pressing a button on the digitizing device. 
Important point, line, or area features are 
traced on the source materials, and the coor- 
dinates are recorded in GIS-compatible for- 
mats. Valuable data on historical maps may 
be converted to digital forms through the use 
‘of manual digitizing. On-screen digitizing 


and hardcopy digitizing are the two most. 
common forms of manual digitization. 


On-Screen Digitizing 


On-screen digitizing involves identify- 
ing and digitizing features on a digital 
image. Digitizing software allows the opera- 
torto trace the points, lines, or polygons that 
are identified on the image (Figure 4-9) and 
saves the coordinates and added attribute 
data into spatial data layers, The operator 
may also specify the type of feature to be 
recorded, the extent and magnification of the 
image on screen, the mode of digitizing, and 
other options to control how data are input. 
The operator typically guides a cursor over 
points to be recorded using a mouse or other. 
pointing device, and depresses a button or 


Figure 49: ———Ó Á'— 


fete data digitized manually. Bung, 
image may be брге 


ay be даара on the. 


sequence of buttons to collet the point coor- 
dinates. Much image data produced for spa- 
tial data entry since 2010 is delivered as 
orthoimages or orthophotographs, with the 
inherent image distortion removed. in a stan- 
dard, projected coordinate system, so that 
data may be directly extracted from the 
images in to a spatial data layer. 


Field Digitizing 


Features or attributes may also be 
directly digitized while in the field, and this 
is the second most common way of modem 
digital data collection. The user occupies the 
target position and collects coordinates with 
a positioning device, typically a satellite- 
based receiver. These Global Navigation 
Satellite Systems (GNSS) are described in 
detail in Chapter 5, but the systems will be 
briefly introduced here to underscore its 
importance, and the interaction with manual 
digitizing. 

Point locations may be recorded b 
single observation (sometimes called а posi- 
tion fix) or by averaging multiple fixes for 
‘more accurate locations. Lines or polygons 
may be digitized while moving, recording a 
vertex at a specified time or distance inter- 
val. Attributes may be collected while in the 
field. e.g., fire hydrant type, road incident, or 
wetland identifies. 

Field digitizing often starts with an 
existing, manually digitized data layer, with 
field coordinates or observations used to 
update the existing features. Data are often 
transfered via software to a computer, and 
usually require subsequent manual editing, 
using methods described in following sec- 
tions of this chapter. 
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Hardcopy Digitizing 

Hardcopy digitizing is another form of 
data entry, now infrequently used. It 
involves digitizing from a paper, plastic, or 
‘other hardcopy map. An operator securely 
attaches a map to a digitizing surface and 
traces lines or points with an electrically sen- 
sitized puck (Figure 4-10). The operator 
manually identifies the location of each 
shape-defining location to record points, 
lines, and polygons into a vector data зет. 

The most common digitizers are based 
‘on a wire grid embedded in or under a table. 
Depressing a button specifies the puck loca- 
tion relative to the digitizer coordinate sys- 
tem. The digitizer coordinates are then. 
‘converted 1 jected system 
through software and control points. While 
‘once a major method for capturing spatial 
data, hardcopy map digitizing is rare 
because most paper documents have been 
converted to digital forms 


Figure 410; Mammal dgicing on a digitizing 
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Characteristics of Manual Digitiz- 
ing 

Manual digitizing is common because it 
provides sufficiently accurate data for most 
applications. Proper digitizing adds little 
error to that already in the source materials 
and uses readily available equipment. 
Humans are still better than machines at 
interpreting image detail, particularly from 
non-ideal images, those collected with poor 
lighting, hazy skies, or for complex targets. 
‘Short training periods are required and data 
quality may be frequently evaluated. 

Мани ging nay dace spatial 
error in several ways. Use of uncorrected, 
non-orthometric images is a common source. 
of spatial error, Images are inherently dis- 
orte. In some rare cases these distortions 
are small, bot almost all images should be 
processed to remove geometric error. If 
using corrected images, generalization and 
pointing error, magnified by the display 
Scale, often have the largest impact on spa- 
tial data accuracy. 

Errors due to human pointing ability are 
the base limit on data accuracy, but may be 
reduced when on-screen digitizing by zoom- 
ing in to larger scales as needed. Zooming 
does not remove errors inherent in original 
images, and frequent changes in scale, while 
reducing added spatial errors, come at the 
cost of reduced digitizing efficiency. One 
test using a high-precision digitizing table 
revealed digitizing errors averaging approxi- 
‘mately 0.067 mm (Figure 4-11). Errors fol- 
lowed a random normal distribution, and 
‘varied significantly among operators. These 
average errors translated to an approxi- 
‘mately 1.6-meter error when scaled from the 
1:24,000 scale map to a ground-equivalent. 
distance 

Pointing errors may be magnified by 
the source display scale. Table 4-1 illustrates 
the effects of scale on data quality. Errors of 
1 millimeter (0.039 in) опа 1:24,000 display 
scale correspond to 24 meters (79 ft) on the 
surface of the Earth. This same 1 millimeter 
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error on a 1:1,000,000 scale display corre- 
sponds to 1,000 meters (3.281 ft) on the 

Earth's surface. Thus, small errors in image 
data collection, production or interpretation 
‘may cause significant positional errors when 
scaled to distances on Earth, and these errors 
are greater the smaller the display scale. 
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‘The person digitizing affects the geo- 
metric quality of manually digitized data. 
Operators differ in visual acuity, hand steadi- 
ness, attention to detail, and ability to con- 
centrate. Work quality may vary through 
time due to fatigue or loss of focus with 
repetitive tasks. Digitizers should take fre- 
quent breaks, do quality and consistency 
checks, and compare to other digitizers to 
ensure accurate, consistent data collection. 


The Manual Digitizing Process 


Manual digitizing involves manually 
positioning the puck or cursor over each tar- 
get point on an image or map and collecting 
coordinate locations. This position/collect 
step is repeated for every point to bec 
E Ee LEA 
Lines have a starting node, then a setof. 
vertices defining the line shape, and an end- 
ing node (Figure 4-12). Hence, lines may be 
Viewed as a series of straight segments con- 
necting vertices and nodes. Polygons are 
connected lines that enclose arcas. 
Digitizing is often in point mode, where 
the operator must depress a button or other- 
‘wise signal to the computer to sample each 
point, or in stream mode, where points are 


node. 


vertex 
“у 


4:12: Nodes define the starting and ending 
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automatically sampled at a fixed time or dis- 
tance frequency, perhaps once each meter. 

‘The stream sampling rate must be speci- 
fied with care to avoid over- or under-sam- 
pled lines. Too short a collection interval 
results in redundant points. Too long a col- 
lection interval may cause the loss of import 
ant spatial detail When using time-triggered 
stream digitizing the operator must continu- 
‘ously move the digitizing puck; pausing for 
a period longer than the sampling interval 
digitizes multiple points clustered together. 
This often creates а “rat's nest” of lines that 
must later be removed, 

Minim distance digitizing avoids 
some of the problems inherent with time- 
sampled streaming. In minimum distance 
digitizing. a new point is not recorded unless 


таг nest of line segments. The threshold 
must be chosen carefully - neither too large, 
missing useful detail, nor too small, in effect 
reverting back to stream digitizing. 


Node and Line Snapping 


Positional errors are inevitable when 
data are manually digitized. These errors 
should be small relative to the intended use 
‘ofthe data; for example, the positional errors 
may be limited to 2 meters when only 5- 
meter accuracy is required. These acceptable 
errors may still prevent the generation of 
‘correct networks or polygons. For example, 
river flow analysis may be in error because 
major tributaries do not connect, or polygon 
features may be incomplete because their 
boundaries don’t close. These small errors 
must be removed or avoided during digitiz- 
ing. Figure 4-13 shows some common digi- 
tizing errors. 

Undershoots and overshoots often occur 
‘when digitizing. Undershoots are nodes that 
бо not quite reach the line or another node, 
and overshoots are lines that cross over 
‘existing nodes ог lines (Figure 4-13). Under- 
shoots cause unconnected networks and 
uunclosed polygons. Overshoots typically do 
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line crossing 


Figure 4:13: Common digitizing eon. 


‘not cause problems when defining polygons, 
Dany ey esee lui wa ey 
line networks. 

Node snapping and line snapping are 
used rece moot and overihoots 
‘while digitizing. Snapping is a process. 

‘automatically setting nearby points to have 
the same coordinates. Snapping relies on a 
snap tolerance ot snap distance. This dis- 


tance may be interpreted as а minimum dis- 
tance between features. Nodes or vertices 
closer than this distance are moved to 
ecu te sane locaton (Figure 4-14). 
digitizing. an existing node or vertex 
becomes “magnetic,” and pulls а new node 
‘or vertex to it within the snap distance. Node 
snapping prevents a new node from being. 
placed within the snap distance of an already. 
existing node. Remember that nodes are 


before snapping: 


after snapping 
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used to define the ending points of a line. By 
snapping two nodes together, we ensure a 
connection between digitized lines. 

Line snapping, sometimes called edge 
snapping, may also be specified. Line suap- 
ping inserts а node at a line crossing and 
clips the end when a small overshoot is digi- 
tized, This forces a node to connect to a 
nearby line while digitizing, but only when 
the undershoot or overshoot is less than the. 
snapping distance. When used property, line 
and node snapping reduce the number of 
'undershoots and overshoots. Closed poly- 
gons or intersecting lines are easier to digi- 
tize accurately and efficiently under node 
and line snapping. 

The snap distance must be carefully 
selected. If too short, then snapping has little 
impact. Consider an operator and equipment 
that rarely digitizes endpoints within $ 
meters of their intended location. If the snap 
tolerance is set to 1 meter, few points will be 
snapped to target lines, with frequent dan- 
gles and incomplete polygons. Conversely. 
we sacrifice accuracy ifthe saap tolerance is 
set too large. With a snap tolerance of 10 
meters, feature boundaries less than 10 
meters apart cannot be recorded. The snap 
distance should be smaller than the desired 
positional accuracy, such thatthe needed 
detail contained in the digitized data is 
recorded. Careful selection of the snap dis- 
tance should reduce digitizing errors and sig- 
nificantly reduce time required for later 
editing. 


Reshaping: Line Smoothing and 
Thinning 

Digitizing software may provide tools to 
smooth, densify, ог thin points while ешет- 
ing data. One common technique uses spline 
functions to smoothly interpolate curves 
between digitized points, and thereby both 
smooth and densify the set of vertices used 
to represent a line. A spline is а set of poly- 
nomial functions that join smoothly (Figure 
4-15). Polynomial functions аге fit to suc- 
cessive sets of points along the vertices in a 
line: for example, a function may be fit to 
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points 1 through 5, and a separate polyno- 
mial function fit to points 5 through 11 (Fig- 
‘ure 4-15). Constraints force these functions 
to connect smoothly, usually by requiring 
the first and second derivatives of the func- 
tions to be continuous at the intersection 
point. This means the lines have the same 
slope at the intersection point, and the slope 
is changing at the same rate for both lines at 
the intersection point. Once the spline func- 
tions are calculated, they may be used to add 
vertices. For example, several new vertices 
тау be automatically placed on the line 
between digitized vertices 8 and 9, leading 
to the "smooth" curve shown in Figure 4-15. 

Data may also be digitized with too 
many vertices. High densities often come 
from stream mode digitizing. spline ог 
smoothing functions, or classified image 
data and then raster-o-vector conversion. 
Redundant vertices may be removed without 
sacrificing spatial accuracy. This helps by 
reducing file size and improving processing 
speed. Point thinning can fix this. 

Many point thinning methods use a pèr- 
pendicular weed" distance, measured from 
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ıa spanning line, to identify redundant points. 
The Lang method exemplifies this approach. 
А spanning line connects two nonadjacent 
vertices in a line. A predetermined number 
of vertices is spanned. The initial spanning 
‘number bas been set to 4 in Figure 4-16, 
‘meaning four points will be considered at 
each starting point. Areas closer than the 
‘weed distance are shown in gray in the fig- 
ure, A straight line is drawn between а start- 
ing point and the search endpoint, the fourth 
point down the line (Figure 4-16). Any 
intermediate points that are closer than the 
‘weed distance are marked for removal. In 
Figure 4-160, no points are within the weed 
distance, therefore, none are marked. The 
‘endpoint is then moved to the next closest 
remaining point (Figure 4-160), and all 
temet points tested Tor renova. 
Again, any points closer than the weed dis- 
ance are marked for removal, shown as 
‘open circles, Note that in Figure 4-16, one. 
point is within the weed distance and is 
removed. Once all points in the initial span- 
ning distance are checked, the last remaining. 
‘endpoint becomes the new starting point, 


Figure $16 The Lane 


uL cy 


anda new spanning line is drawn to connect 
4 points (Figure 4-16, d). 

‘The process may be repeated for succes- 
sive sets of points in a line segment until all 
vertices have been evaluated (Figure 4-16e 
toh). All close vertices are viewed as not 
recording а significant change in the line 
shape, and hence are expendable. A balance 
must be struck between the removal of 
redundant vertices and the loss of shape- 
defining points, usually through a careful set 
of test cases with successively larger weed 
distances. 

There are many variants on this basic 
concept. Some look only at three immedi- 
ately disce points, testing the middle 
point against the line spanned by its two 
neighboring points. Others constrain or 
‘expand the search based on the complexity 
ofthe line. Rather than always looking at 
four points, as in our example above, more 
points are scrutinized when the line is not 

(nearly straight), and fewer when 
the line is complex (many changes in direc- 
tion). 


method. In the Lang method. vertices are 


rom Weibel. 1997). 


Scan Digitizing 


Optical scanning is another method for 
converting hardcopy documents into digital 
formats. Scanners have elements that emit 
and sense light. Most scanners pass light 
over a map. precisely locating reflectance to 
identify features. Scan digitizing usually 
requires some form of skeletoning, or line. 
thinning, particularly if the data are lo be 
converted to a vector data format. Scanned 
lines are often wider than a single pixel (Fig- 
ure 4-17). A pixel near the "center" of the 
point or line is typically chosen, e.g.. nearest 
the center of the local perpendicular bisector 
of the line. Skeletonizing reduces the widths 
of lines or points toa single pixel. 


Editing Geographic Data 

Spatial data may be edited, or changed. 
for several reasons. Errors and inconsisten- 
cies ме inevitably introduced during spatial 
data entry. Undershoots,overshoots, missing 
or extra lines, and missing or extra points or 
labels are all errors that must be corrected. 
Spatial data can change over time. Parcels 
are subdivided, roads extended or moved, 
forests grow or are cut, and these 
may be entered in the spatial database 
through editing. 

Software helps operators identify poten- 
tial errors. Line nodes may be classified as 
‘connecting or dangling. A connecting node 
joins two or more lines, while a dangling 


Before thinning 


Chapter 4: Maps and Data Enty 161 


node is attached to only one line. Some dan- 
gling nodes may be intentional, for example, 
a cul-de-sac in a street network, while others 
‘willbe the result of under- or overshoot. 
Dangling nodes can be quickly evaluated 
and, if appropriate, corrected. 

Attribute consistency may also be used 
to identify errors. Operators note areas in 
which contradictory theme types occur in 
different data layers, The two layers are 
either graphically or cartographically over- 
lin. Contradictory co-occurrences are iden- 
tified, such as water in one layer and upland 
areas in a second. These contradictions are 
then either resolved manually. or automati- 
cally via some predefined precedence hierar- 
dy 

Many GIS software packages provide a 
‘comprehensive set of editing tools (Figure 4- 
18). Editing typically includes the ability to 
select, split, update, and add features. Selec- 
tion may be based on geometric attributes, ог 
with a cursor guided by the operator, Selec- 
tions may be made individually, by geo- 
graphic extent (eg. select all features in a 
box, circle, or within a certain distance of the 
pointer) or by geometric attributes (eg. 
Select ай nodes that connect to only one. 
line). Once a feature is selected, various 

‘operations may be available, including eras- 
ing ай or part of the feature, changing the 
coordinate values defining the feature, and 
in the case of lines, splitting or adding o the 
feature. A line may be split into parts, either 


After thinning 
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to isolate а segment for future deletion, or to- 
‘modify only а portion of the line. Coordi- 
nates are typically altered by interactively 
selecting and dragging points, nodes, or ver- 
tices to their best shape and location. Points 
or line segments are added as needed. 


Groups of features in an area may be 


Polynomial equations are often used due to 
their flexibility and ease of application. 
Anchor points are selected, again on the 
graphics screen. and other points are 
selected by dragging interactively on the 
Screen to match point locations. All ines and 
points except the anchor points are interac- 
tively adjusted, 

All edits should be made with due aten- 
tion to distance shifts during editing. On- 
screen editing to eliminate undershoots 
should only be performed when the "tne" 
Iocuions of features may be identified accu- 


rately, and the new features can be confi- 
dently placed in the correct location. 
‘Automatic removal of short undershoots 
may be performed without iı addi- 
tional spatial error in most instances. A short 
distance for an undershoot is subjectively 
defined, but typically it below the error 
inherent in the source materials, or at least a 
distance that is insignificant when consider- 
ing the intended use of the spatial data, 
Slivers 

Slivers are particularly pernicious errors 
that ойеп come from poor digitizing. These 
are small gaps or overlaps along shared 
polygon boundaries (Figure 4-19). Slivers 
Site mm prs ar enr space wich 

represent adjacency among. 

polygons. or overlaps which create spurious. 
polygons, and don't represent any real fea- 
tures 


overlaps; 


Figure 4.19: Gaps and overlaps in polygon bound- 
aris resti liver, which umally nat be 
трата prior o data трчав 


Slivers are also undesirable because 
each [ныч ہہ‎ ади зауы 
the attribute table. Sliver polygons may be 
quite numerous, in extreme cases oumum- 
bering the tre polygons ina data layer. This 
‘adds storage and processing overhead but no. 
value to the data set. 

Slivers most often occur when snap dis- 
tances are set too short, so that nodes aren't 
automatically drawn together while digitiz- 
ing. Slivers may also occur when snapping 
only to nodes, and not lines or edges. partic- 
ularly when there are relatively long, straight 
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arcs. eg. for many property boundary lay- 
ers. livers may also be frequent when the 
‘operator is inattentive, tired, or distracted. 
‘Topological digitizing avoids most sliv- 
ers, particularly when following the best 
practice of never digitizing the same line 
twice. Digitizing and editing tools, described 
in the previous sections, usualy include sev- 
eral auto-complete options. These incorpo- 
rate existing polygon edges when digitizing 
an adjacent polygon, automatically closing а 
new polygon boundary. There can be no 
gaps or overlaps, because the tool uses the 
existing edge to help create a new polygon. 
These tools make digitizing both more 
Чеш and more accurate, 


Features Common to Several 
Layers 

One common problem in digitizing 
derives from ation of features that 
occur in multiple data layers or images. 
These features rarely have identical loca- 
tions on each source, and often occur in dif- 
ferent locations when digitized into their 
respective data layers (Figure 4-20), For 
‘example, water boundaries on soil survey 
maps rarely correspond exactly to water 
boundaries found on hydrographic maps. 

Features may appear differently on dif- 
ferent maps for many reasons. Perhaps the. 


Figure 4.20: Common festes may be spatially засаа in different spatial ма layers, 
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maps were made for different purposes or at 
different times. Features may differ because. 
the maps were from different source materi- 
als; for example, one map may have been 
based on ground surveys while another was 
‘based on aerial photographs. Digitizing can 
also compound the problem due to differ- 
‘ences in digitizing methods or operators. 
There зге several ways to remove this 
"common feature” inconsistency. One 
involves removing inconsistencies while re- 
drafting the data from conflicting sources 
into a new base layer. Redraftng is labor. 
intensive and time consuming, but forces a 
resolution of inconsistent boundary loca- 
tions, Redrafting also allows several inputs 
to be combined into a single data layer. 


A second, often preferable method 
involves establishing a “master” boundary 
that is the highest accuracy composite of the 
available data sets. А digital copy or overlay 
‘operation establishes the common features 
аза base in all the data layers, and this base 
may be used as each new layer is produced. 
For example, water boundaries might be 
‘extracted from the зой survey and USGS 
sources, and these data combined in a third 
data layer. The third data layer would be 
edited to produce a composite, high-quality 
water layer. The composite water layer 
would then be copied back into both the soils 
and USGS layers. This second approach, 
While resulting in visually consistent spatial 
data layers, is in many instances only a cos- 
‘metic improvement of the daa. If there are 
large discrepancies (“large”is defined rela- 
tive to the required spatial data accuracy), 
then the source of the discrepancies should 
be identified and the most accurate data. 
used, or new, higher-accuracy data collected 
from the field or original sources. 


Map Boundaries and Spatial Data 


Digital data are often created or deliv- 
cred in tiles, giving them edges. Discontina- 
ities often occur at tile boundaries. These 
errors may be lacking in many newer, large- 
area data collected with digital methods, but 


edges will be encountered and should be 
understood, 


There is a trade-off between data vol- 
umes, resolution, and area coverage, result- 
ing in data often split into tiles for delivery. 
Consider a statewide elevation raster, gener- 
ally useful, but cumbersome given that most 
users desire detailed data over a small subset 
ofthe state. A data request may result in sev- 
eal tiles, which then need be recombined 
before analysis. High-resolution images or 
other raster data may challenge the transfer, 
storage, or display processing capabilities of 
‘common computers, and so data are broken 
into more manageable pieces for delivery. 


Data may also have edges due to differ- 
ences in the time of data collection for adja- 
сеш areas, For example, digital aerial 
photographs are collected for most of the. 
agricultural lands of the U.S. on an annual 
basis. Data are collected in blocks over the 
growing season. and because of weather, 
Schedules, and equipment failures, several 
days to weeks may pass between data сойес- 
tion on adjacent blocks. Many features, such 
as crop type, stage of development, wetland 
size, or harvest state may have changed 
between image collection dates and be dis- 
continuous or inconsistent across blocks, 
and hence the data layer. 


Different interpreters may also create 
discontinuities. Large-area mapping proj- 
ects typically employ several interpreters, 
each working on different areas of a region. 
All professional, large-area mapping efforts 
should have protocols specifying the scale, 
sources, equipment, methods, classification, 
keys, and cross-correlation to ensure consis- 
ent mapping across block or data tile bound- 
aries. In spite ofthese efforts, some 

rences due to human interpretation 
‘occur. Feature placement, category assign- 
ment, and generalization vary among imer- 
preters. These problems are compounded 
when extensive checking and guidelines are 
not enforced across the project, especially 
‘when adjacent areas are mapped at different 
times or by two different organizations. 


Digital Data Output 


Once data are created, we often must 
transfer digital data to another user. Given 
the number of different GIS software, oper- 
ating systems, and computer types. transfer- 
ring dala is not always a straightforward 
process, Digital data output typically 
includes two components, the data them- 
selves in some standard, defined format, and 
metadata, ot data about the digital data. We 
will describe data formats and metadata in 
тт. 
Jİ data are the data in some elec- 

пп. As described at the end of the 
first chapter, there are many file formats. or 
ways of encoding the spatial and attribute 
data in digital files. Digital data output often 
consists of recording or converting data into 
one of these file formats. These data are typ- 
ically converted with н Шоу oo or option 
availabe inthe data development software 


(Figure 4-21), The most useful of these utili 
ties supports а broad range of input and out- 
put options, each fully described in the 
program documentation. 
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А common contemporary format is the 
Geographic Markup Language (GML). This 
В an extension of XML for geographic fea- 
tures; XML is in turn the lingua franca for 
human machine readable documents. As 
with most XML, there are two parts for any 
GML dataset: a schema that describes the 
document, and the document containing the 
geographic data. GML is a standard, but 
there can be many extensions, so a commu- 
nity of users can extend the standard with 
additional features, and document the exten- 
sion ina standard way. 

There are many legacy digital data 
transfer formats that were widely used 
before GML. GML replaced a withdrawn 
standard, the Spatial Data Transfer Standard 
(SDTS), with translators available to or from 
this older format. There are several U.S, 
Geological Survey formats forthe transfer of 
digital elevation models or digital vector 
data. or software-specific formats, such as an 
ASCII format (GEN/UNGEN) that was 
developed by ESRI. These were useful fora 
limited set of transfers, but shortcomings in. 
each of these transfer formats led to the 
development of subsequent standards, These. 
formats are not common, but sometimes. 
arise in converting older data sets. 


Metadata: Data Documentation 


Metadata are information about spatial 
data. Metadata describe the content, source, 
еце methods, developer, coordinate sys- 
tem, extent, structure, spatial accuracy. 
butes, and responsible organization for 
spatial data. 

Metadata are required for the effective 
use of spatial data. Metadata allow the effi- 
cient transfer of information about data, and 
inform new users about the geographic. 
exten, coordinate system, quality, and other. 
data characteristics. Metadata aid organiza- 
tions in evaluating data to determine if they 
эге suitable for an intended use, ер. to 
review accuracy, coverage, or information 
needs. Metadata may also aid in data updates 
by guiding the choice of appropriate collec- 
tion methods and formats for new data. 
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Most governments have or ae inthe 
process of establishing standard methods for 
reporting metadata, In the United States, the 
Federal Geographic Data Committee 
(FGDC) has defined а Content Standard for 
Digital Geospatial Metadata (CSDGM) to 
specify the content and format for metadata. 
‘The CSDGM ensures that spatial data are 
clearly described so that they may be used 
effectively within an organization. The use 
ofthe CSDGM also ensures that data may be 
described to other organizations in a stan- 
dard manner, and that spatial data may be 
‘more easily evaluated by and transferred to 
other organizations. 


4. Spatial Reference Information: 


The CSDGM consists ofa standard set 
of elements that are presented in a specified 
order. The standard is flexible and may be 
extended to include new elements for new 
categories of information in the бише. 
There are over 330 different elements in the 
CSDGM. Elements have standardized long 
and short names and are provided in a stan- 
dard order with a hierarchical numbering 
system. For example. the westernmost 
bounding coordinate of a data set is element 
15.1.1, defined as follows: 

LSLIWest Coordinate - west- 
emmost coordinate of the limit of coverage 
expressed in longitude. 


4.1 Horizontal Coordinate System Definition, 


4.22 Grid Coordinate * 


„System: 

4.22.1 Grid Coordinate. System. Name: 
Universal Transverse Mercator 

4.1.2.2.2 Universal_Transverse_Mercator. 

41222.21 0ТМ Zono Number. 10-19 

4.1.2.4 Planar. Coordinate Information: 
4124.1 Planar Coordinale Encoding. Method: 

coordinate pair 


24. » Representation: 
LAE abscisa Resouton 2 54 
11412422 Ordinate Rosoluton: 254 


42.14 Ajuda, Encoding M Method: attribute values 
422 Depth System Definition: 
4.22.1 Depth Datum Name: Mean lower low water 
42:22 Depth Resolution; 1 
4223 Depth Distance Units: meters or feet 
4224 Depth Encoding Method: attribute values 


Figur 22 ole ofa small pron ofthe FGDC recommended нада fera 1-100000 ele 
derived digital data set. 


Tpe: real 
Domain: -180.0 <= West Bounding Coor- 
dinate < 180.0 

Short Name: westbc 


The numbering system is hierarchical. 
Here, 1 indicates it is basic identification 
information, 1.5 indicates identification 
information about the spatial domain. 1.5.1 
is for bounding coordinates, and 1.5.1.1 is 
the western most bounding coordinate. 

‘There are 10 basic types of information 
in the CSDGM: 

1) identification, describing the data set. 

2) data quality, 

3) spatial data organization. 

4) spatial reference coordinate system, 

5) entity and attribute, 

6 distribution and options for obtaining 

the dataset, 

Th currency of metadata and responsible 

pany, 

8) citation. 

time period information used with other 

sections to provide temporal information. 

and 

10) contact organization or person. 

The CSDGM isa content standard and 
does not specify the format of the metadata. 
As long as the elements are included, prop- 
erly numbered, and identified with correct 
values describing the data set. the metadata 
are considered to conform with the CSDGM. 
Because metadata may be quite complex. 
there are a number of conventions in the pre- 
sentation of metadata, These conventions 
seek to ensure that metadata are presented in 
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a clear, logical way to humans, and are also 
‘easily ingested by computer software. There 
isa Standard Generalized Markup Language 
(SGML) for the exchange of metadata. An 
‘example of a portion of the metadata for a 
1:100,000 scale digital line graph data set is 
shown in Figure 4-22 

‘Metadata are most often created using 
specialized software tools. There are often 
‘complex linkages between metadata ele- 
ments, and some elements are repeated or 
redundant. Software tools often ease meta- 
data entry by copying across redunancies, 
ensuring correct linkages, and checking ele- 
ments for contradictory information or 
errors. Metadata are most easily and effec- 
tively produced when their development is 
integrated into the workflow of data produc- 
tion. 

Although not all organizations ín the 
United States adhere to the CSDGM meta- 
data standard, most organizations record and 
‘organize a description and other important 
information about their data. and many orga- 
nizations consider a dataset incomplete if it 
lacks metadata. 

There is a parallel effort to develop and 
maintain intemational standards for meta- 
data. The standards are known as the ISO. 
19115 International Standards for Metadata. 
According to the International Standards 
‘Organization, the ISO 19115 "defines the 
Schema required for describing geographic 
information and services. It provides infor- 
mation about the identification, the extent, 
the quality. the spatial and temporal schema, 
spatial reference, and distribution of digital 
geographic data. 
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Cartography and Map Design 


Cartography is the ап and techniques of 
‘making maps. It encompasses both graphic 
tools and how these tools may be combined 
to communicate spatial information. Cartog- 
raphy is a discipline of much depth and 
‘breadth, and there are many books, journals, 
and societies devoted tothe science and art 
of cartography. Our aim in the nex few 
pages is to provide a brief overview of car- 
{ography with a particular focus on map 
design. This is both to acquaint new students 
‘with the most basic concepts of cartography. 
and help them apply these concepts in the 
consumption and production of spatial infor- 
‘mation, Readers interested in a more com- 

lete treatment should consult the references 
listed at the end of this chapter. 

А primary purpose of cartography is to 
communicate spatial information. This 
requires identifying the. 

"intended audience, 

information to communicate. 
area of interest, 

-physical and resource limitations: 


in short, the whom, what, where, and bow. 
же may present our information. 

These considerations rive the major 
cartographic design decisions when we 
‘make a map. We must consider the 


-scale size, shape, and other general 
map properties, 

«dota to plot, 

-symbol shapes, sizes, or patterns, 
-labeling. including font type and size, 
و‎ popes hn satan, 


-the placement of all these elements on 
апар. 

Map scale, size, and shape depend pri- 
marily on the intended map use. Wall maps 
for viewing at distance may have few, large, 
boldly colored features. Hand-held street 
maps are more detailed, to be viewed at 


short ranges, and have a rich set of addi- 
tional tables, lists, or other features. 


Map scale is often determined in part by 
the size of the primary objects we wish to 
display, and in part by the most appropriate. 
media sizes, such asthe page or screen size. 
possible for a document. As noted earlier, 
the map scale is the ratio of lengths on a map 
to true lengths. If we wish to display an area 
that spans 25 km (25,000 m) on а screen that 
spans 25 cm (025 m). the map scale will be 
near 0.25 to 25,000, or 1:100,000. This deci- 
sion on size, area, and scale then drives fur- 
ther map design. For example, scale limits 
the features we may display, and the size, 
‘number, and labeling of features. At a 
1:100,000 scale we may not be able to show 
all towns, as there may be too many to fit at 
a readable size, 


Maps typically have a primary theme ог 
purpose that is determined by the intended 
audience. General purpose maps typically 
have a wide range of features represented, 
including networks, towns, 
elevation, or other common features (Figure 
4-230). Special purpose maps, such as road 
‘maps, focus on a more limited set, in this 
instance road locations and names, town 
names, and large geographic features (Figure 
4-230). 


Мар Types 

Many types of digital and hardcopy 
maps are produced, and the types are often 
named by how objects аге depicted. Feature. 
‘maps are commonly used for points lines. 
areas and nominal information (Fi 
ure 4-24, upper left). No attempt is made for 
symbols to represent true scale. A road may 
be plotted with a symbol defining the type of 
Toad, but the width of the road as plotted is 
not true to scaled size on the ground. 

Choropleth maps depict quantitative 
information for areas. A mapped variable 
such as population density may be repre- 
sented in the map (Figure 4-24, top right) 


Polygons define area boundaries, such as 
counties or census tracts. Each polygon is 
given а color, shading. or pater corre- 
sponding to values for a mapped variable. 

Dot density maps show quantitative data 
(Figure 4-24, bottom left). Dots or other 
point symbols are plotted to represent val- 
ues. Dots are randomly placed in the poly- 
gon such that the number of dots equals the 
total value for the polygon. Each dot оп the 
example map represents 50,000 people; 
however, each point is not а city or other. 
concentration of inhabitants. Note the posi- 
tion of points in the dot density map relative 
to the city locations in the feature map 
directly above it in Figure 4-24. 

корей maps, also known as contour 
‘maps, display lies of equal value (Figure 4- 
24, bottom right). Isopleth maps are used to 
represent continuous surfaces. Rainfall, ele- 
vation, and temperature аге features that are 
commonly represented using isopleth maps. 
A line on the isopleth map represents a spec- 
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ified value, for example, а 10°C isopleth 
defines the position on the landscape at that 
temperature. Lines typically do not cross, in 
that there cannot be two different tempera- 
tures at the same location. However, iso- 
pleths often depict elevation, and cliffs or 
‘overhanging terrain do have multiple eleva- 
tions at the same location. In this case, the 
lower elevations typically pass "under" the. 
higher elevations, and the isopleth is labeled. 
‘with the tallest height (Figure 4-25), 

‘Once the features to include on a map 
ме defined, we must choose the symbols. 
used to draw them. Symbology depends in 
рап on the type of feature. For example, we 
have a different set of options when repre- 
seating continuous features such as elevation 
‘or pollution concentration than when repre- 
senting discrete features. We also must 
‘choose among symbols for each of the types 
‘of discrete features: for example, the set of 
symbols for points are generally different 
from those for line or area features. 


түш. 
‘Sade Te fe 


ofc) detailed, general-purpose map. here a portion of a United States Geologi- 
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Choropleth map. 


Figure 4-24 Connon hardcopy map types depicting the northeastern Салей States 


Symbol size is an important attribute of 
map symbology, often specified in a unit 
called a point. One point is approximately 
‘equal to 0.467 mm, or about 1/SSth of an 
inch. A specific point number is most often 
used to specify the size of symbols, for 
example, the dimensions of small squares to 
represent houses on a map, oc the character- 
istics ofa specific patter used to fill areas 
опа map. A line width may also be specified 
in points. Setting a line width of two points 
means we want that particular line plotted 
with a width of 0.93 mm. It is unfortunate. 
that “point” is both the name of the distance 
unit and a general property of a geographic 
feature, as ш “a tree is a point feature" This 
forces us to talk about the "point size" of 
забоі to repro poe ies or area 
‘or pattems, but if we are careful, we 


may communicate these specifications 
clearly. 

‘The best size, pattem, shape, and color 
used to symbolize each feature depends on 
the viewing distance; the number, density, 
and type of features; and the purpose of the. 
map. Generally, we use larger, bolder or 
thicker symbols for maps to be viewed from 
longer distances, while we reduce this limit 
when producing maps for viewing at 50 cm 
(20 in). Most people with normal vision 
"under good lighting may resolve lines down 
To near 02 points at close distances, pro- 
vided the lines show good contrast with the 
background. Although size limits depend 
largely on background color and contrat, 
рош features are typically not resolvable at 
sizes smaller than about 0.5 points, and dis- 
tinguishing between shapes is difficult for 
point features smaller than approximately 2 
points in their largest dimension. 

‘The pattern and color of symbols must 
also be chosen, generally from a set pro- 
vided by the software (Figure 4-26). Sym- 
bols generally distinguish among feature 
type by characteristics, and although most 
symbols are not associated with a feature 
type, some are, such as, plane outlines for 
airports, numbered shields for highways, or 
a hatched line fora railroad. 

‘We also must offen choose whether and 
howto label features. Most GIS software 
provides a range of tools for creating and 
placing labels, and in all cases we must 
choose the label font type and size, location. 
relative to the feature, and orientation. Pri- 
mary considerations when labeling point 
features are label placement relative to the 
point location, label size, and label orienta- 
tion (Figure 4-27). We may also use gradu- 
ated labels, that is, resize them according to 
some variable associated with the point fea- 
ture. For example, it is common to have. 
larger features and label fonts for larger cit- 
ies (Figure 4-27). Labels may be bent. 
angled, or wrapped around features to 
improve clarity and more efficiently use 
space ina map. 
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Label placement is very much an ап, 
and there is often much individual editing 
required when placing and sizing labels for 
finished maps. Most software provides for 
automatic label placement, usually specified 
relative to feature location. For example, one 
may specify labels above and to the right of 
all points, ог line labels placed over line fea- 
tures, or polygon labels placed near the poly- 
gon centroid, However, these automatic 
placements may not be satisfactory because 
labels may overlap, labels may fall in clut- 
tered areas of the map, or features associated 
‘with labels may be ambiguous. Some soft- 
ware provides options for automatic label 
placement, including removal or movement 
of overlapping labels. These often reduce. 
‘manual editing, but sometimes increase it 
Figure 4-28 shows a portion ofa map of 
southern Finland, This region presents sev- 
eral mapping problems, including the high 
density of cities near the upper right, an 
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irregular coastline, and dense clustering of 
islands along the coast. Most labels are 
placed above and to the right of their corre- 
sponding chy: howeve some are moved or 
angled for clarity. Cities near the coast show 
both, to avoid labels crossing the water/and 
‘boundary where practical Semitransparent 
background shading is added for Parainen 
and Hanko, cities placed in the island matrix. 
This example demonstrate the individual 
editing often required when placing labels. 
Most maps should have legends. The 
legend identifies map features succinctly and 
describes the symbols used to depict those 
features. Legends often include orare 
grouped with additional map information. 
such as scale bars, north arrows, and descrip- 
tive text. The cartographer must choose the 


size and shape ofthe descriptive symbol, 
and the font type, size, and orientation for 
each symbol in the legend. The primary goal 
is to have а clear, concise, and complete leg- 
end. 


‘The kind of symbols appropriate for 
map legends depends. the ype af fn 
Different choices are avail- 
Airis pont, en та prema tes, or 
for continuously variable features stored as 
rasters. Most software provides a range of 
lege elements and symbols at maybe 
, these tools allow a wide 
Sepe qui. isst атое 
of describing the symbolization in a legend 
(Figure 429). 
The specific layout of legend features. 
must be defined: for example, the point fea- 


ure symbol size may be graduated based on 
some attribute for the points. Successively 
larger features may be assigned for succes- 
sively larger cities. This must be noted in the 
legend, and the symbols nested, shown 
sequentially, or otherwise depicted (Figure 
4-29. top left, 


The legend should be exhaustive. Exam- 
ples of each different symbol type that 
appears on the map should appear in the leg- 
end. This means each point, line, or area 
symbol is drawn inthe legend with some 
descriptive label. Labels may be next to, 
‘wrapped around, or embedded within the 
features, and sometimes descriptive numbers 
are added, for example, a range of continu- 
ous variables (Figure 4-29, upper lefi). Scale 
bars, north arrows, and descriptive text 
boxes are typically included in the legend. 


Map composition or layout is another 
primary task. Composition consists of deter- 
‘mining the map elements, their size, and 
their placement. Typical map elements, 
shown in Figure 4-3 and Figure 4-4, include 
опе or more main data panes or areas, a leg- 
end, a title, a scale bar and north arrow, 


grid or gaticule, and descriptive 
text. These each must be sized and placed on 
the map, 

‘These map elements should be posi- 


tioned and sized in accordance with their 
importance. The map's most important data 
pane should be largest, and itis often cen- 
tered or otherwise given visual dominance. 
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Other elements are typically smaller and 
located around the periphery or embedded 
‘within the main data pane. These other ele- 
ments include map insets, which are smaller 
data panes that show larger or smaller scale 
views of a region in the primary data pane. 
Good map compositions usually group 
related elements and use empty space effec- 
tively. Data panes are often grouped and leg- 
‘end elements placed near each other, and 
grouping is often indicated with enclosing 
bones. 


‘Neophyte cartographers should avoid 
two tendencies in map composition, both 
depicted in Figure 4-30. First, it is generally 
‘easy to create а map with automatic label 
and legend generation and placement. The 
map shown at the top of Figure 4-30 is typi- 
‘al of this automatic composition, and 
includes poorly placed legend elements and 
too small, poorly placed labels. Labels 
crowd each other, are ambiguous, and cross 
waterland or other feature boundaries, and 
fonts are poorly chosen. You should note 
that automatic map symbol selection and 
placement is nearly always suboptimal, and 
the novice cartographer should scrutinize 
these choices and manually improve them. 


Area Legend 
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The second common error is poor use of 
empty space, those parts of the map without 
map elements. There are two opposite ten- 
dencies: either to leave too much or unbal- 
anced empty space or to clutter the map in an 
attempt to fill all empty space. Note thatthe 
map shown at the top of Figure 4-30 leaves 
large empty spaces on the left (western) 
edge, with the Atlantic Ocean devoid of fea- 
tures. The cartographer may address this in 
several ways: by changing the size, shape, or 
extent of the area mapped: by adding new 
features, such as data panes as insets, addi- 
tional text boxes, ог other elements; or by 
‘moving the legend or other map elements to 
that space. The map shown at the bottom of 
Figure 4-30, while not perfect, fixes these 
design fas in part by moving the legend 
and scale bar, and in part by adding labels 
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for the Atlantic Ocean and Mediterranean 
Sea. The empty space is more balanced in 
that it appears around the major map ele- 
ments in approximately equal proportions. 

‘As noted earlier, this is only a brief 
introduction to cartography, a subject cov- 
ered by many good books, some listed at the 
end of this chapter. Perhaps the best com- 
pendium of examples is the Map Book 
Series, by ESRI, published annually since 
1984. Examples are available atthe time of 
this writing at www.esri com/mapmuseum. 
You should leaf through several volumes in 
this series, with an eye toward critical map 
design. Each volume contains many beauti- 
ful and informative maps, and provides tech- 
niques worth emulating. 
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Coordinate Transformation 


‘Coordinate transformation was once a 
‘common operation inthe development of. 
spatial data for GIS. It is becoming less fre- 
quently needed as we have worked through 
digitizing the legacy paper maps on which 
‘we stored most of our spatial data, and new 
image sources now routinely correct the geo- 
metric distortion inherent in images. How- 
‘ever, some older maps still remain to be 
converted, and some image systems generate 
images with acceptable geometric accura- 
cies, although not ortho-corrected. Coordi- 
nate transformations are covered here for 
completeness. 


‘A coordinate transformation aligns spa- 
tial data into an Earth-based map coordinate 
system. This alignment ensures features fall 
in their proper relative position when digital 
data from different layers are combined. 
Within the limits of data accuracy, a good 
transformation helps avoid inconsistent spa- 
tial relationships such as farm fields on free- 
‘ways, roads under water, or cities in the 
middle of swamps, except where these truly 
‘exist, Coordinate transformation is also 
referred to as registration, because it “regis- 
ters” the layers 0 а map coordinate system. 

Coordinate transformation is most com- 
‘monly used to convert newly digitized data 
from an arbitrary image, digitizer, or scanner 
‘coordinate system to a standard map coordi- 
nate system. The input coordinate system is. 
usually based on the input device. An image 
‘may be scanned and coordinates recorded as 
а cursor is moved across the image surface. 
These coordinates are usually recorded in 
pixel, inch, or centimeter units relative to an 
origin located near the lower left comer of 
the image. Before these newly digitized data 
may be used with other data, these "inch- 
space” ог “digitizer” coordinates must be 
transformed into an Earth-based map coordi- 
nate system. 

Figure 4-31 depicts the application of a 
coordinate transformation in data develop- 
‘ment, Early surveys were often stored oa 
paper maps; in this instance, the original 


PLSS surveys. These depict the original 
PLSS boundary lines, as well as lakes and 
wetlands, and in some instances forests, 
grasslands, or other features. We may wish 
to compare past and current conditions, but 
the PLSS is a system of land division, with 
‘no coordinates or projection associated with 
the lines. We need to register the paper maps 
toa projected coordinate system prior to use. 

The PLSS line intersections may be 
used о register the original maps to a pro- 
jected coordinate system. Аз noted in Chap- 
ter 3, PLSS lines often became property 
‘boundaries, and subsequent roads often fol- 
lowed these boundaries. Section line inter- 
sections may be surveyed directly, or 
extracted from other registered data such as 
aerial images, and hence used to transform 
the original maps to a projected coordinate 
system. Features on the original maps, such 
as lakes or wetlands at the time of the origi- 
nal surveys, may be digitized and compared 
to current ones. 


Control Points 


А set of control points is used to trans- 
form the digitized data from the digitizer or 
photo coordinate system to a map-projected 
Coordinate system. These control points are 
used to estimate equations that we use for 
the coordinate transformation (Figure 4-32). 
Control points are different from other digi- 
tized features. When we digitize most points, 
lines, or areas, we do not know the map pro- 
jection coordinates for these features. We 
simply collect the digitizer X and У coordi- 
nates that are established with reference to 
some arbitrary origin on an image or digitiz- 
ing table. Control points differ from other 
digitized points in that we know both the. 
‘map projection coordinates and the digitizer 
coordinates for these points. 

These two sets of coordinates for each 
control point, one for the map projection and 
опе for the digitizer system, are used to esti- 
mate the coefficients for transformation 


equations, usually through a statistical, least 
squares process. The transformation equa- 
tions are then used to convert coordinates 
from the digitizer system to the map projec- 
tion system. 

‘The transformation may be estimated in 
the initial digitizing steps, and applied as the 
coordinates are digitized from the map ог 
image. This “on-the-fly” wansformation 
allows data to be output and analyzed with 
reference to map-projected coordinates. A 
previously registered data layer or image 
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may be displayed on screen just prior to dig- 
itizing a new map. Control points may then. 
be entered, the new map attached to the digi- 
tizing table, and the map registered. The new 
data may then be displayed on top of the pre- 
viously registered data. This allows a quick 
‘check on the location of the newly digitized 
objects against corresponding objects in the 
study area. 

Control points should meet several crite- 
ria. First, they should be from a source with 
the highest feasible accuracy. 


Source ayer 
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Second, control points should be more accu- 
Tate than the desired overall positional accu- 
тасу for the spatial data. Third, control 
points should be evenly distributed through- 
‘out the data area. A sufficient number of 
control points should be collected, above the. 
‘minimum to improve the statistically fit 
transformation functions. 


The coordinates of control points should. 
be known to а high degree of accuracy and 
precision. Because high is subjectively 
defined, there are many methods to deter- 
‘mine control point locations. Subceatimeter 
accuracy may be required for property 
‘boundary control points, while a few meters 
may be acceptable for large-area vegetation 
‘mapping. Common sources of control 
coordinates are traditional transit and 

tance surveys, GNSS measurements, exist- 
ing cartometric quality maps, or existing 
digital data layers on which suitable features 
‘may be identified. 


Transformation Equations 


Different base equations can be used in 
coordinate transformation. The most com- 
mon is the affine coordinate transformation, 
which employs linear equations to calculate. 
‘map coordinates. Map projection coordi- 
nates are often referred to as eastings (E) and. 


Control points 


Dinter coordinates 


northings (N), and are related to the X and Y 


digitizer coordinates by the equations: 
E Теса an 
N тезе аә 


Equations 41 and 42 allow us to move 
from the arbitrary screen or digitizer coordi- 
mate system to the project map coordinate 
system, We know the X and ¥ coordinates 
{for every digitized point, line vertex, or 
polygon vertex. We may calculate the Е and 
N coordinates by applying the above equa- 
tions to every digitized point. 

те and Ты can be thought of as shifts in 
the origins from one coordinate system lo 
the next. The o, and b, parameters incorpo- 
‘ate the change in scales and rotation angle 
between coordinate systems. The affine is 
‘the most commonly applied coordinate 
transformation because it provides for these 
three main effects of translation, rotation, 
and scaling. and because it often introduces. 
less error than higher-order polynomial 
transformations. 


соо лм) 


432 An example of contol poist locations from a road daa layer. and corresponding digitizer 


tap projection coordinates. 


The affine system of equations has six 
parameters to be estimated: Te, Ту, 01 oz. br. 
and bz. Each control point provides E. N. X. 
and Y coordinates, and allows us to write 
two equations. For example, we may have a 
control point consisting of a precisely sur- 
veyed center ofa road intersection. This 
point has digitizer coordinates of X » 1030 
centimeters and Y = -1001 centimeters, and 
corresponding Earth-based map projection 
coordinates of E » 500.0834 and N « 
50036835, We may then write two equa- 
tions based on this control point: 


3000834-Teo(1030),100) (43) 


50036835-T,b/1030)-b1001) (44) 


We cannot find а unique solution to 
these equations, because there ae six 
unknowns (Te Ty ор o. by bo) and ошу 
two equations. We need as many equations 
as unknowns to solve a linear system of 
equations. Each control point gives us two 
equations, so we need a minimum of three 
control points o estimate the parameters of 
an affine transformation. Statistical estima- 
tion requires a total of four control points. 
As with all statistical estimates, more control 
Points are better than fewer, but we will 
Teach a point of retums after 
some number of points, typically somewhere 
between 18 and 30 control points. 

The affine coordinate transformation is 
usually fit using a statistical method that 
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minimizes the root mean square error 
(RMSE). The RMSE is defined 


RMSE = m 


where the e; are the residual distances 
between the true E and N coordinates and the 
E and N coordinates in the output data layer: 


ex m «e 


This residual is the difference between the 
true coordinates Х,У, and the transformed 
‘output coordinates X, Ya, Figure 4-33 
shows examples ofthis lack of fit. Individual 
residuals may be observed at each control 
point location. 

and itid method бх estimating 

вайхо дийн is prefered 

ener do Mates пы ан 
‘error. Control point coordinates contain 
unavoidable measurement errors. A statisti- 
‘al process provides an RMSE, a summary 
of the differences between the “true” (mea- 
sured) and predicted control point coordi- 
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nates. It provides one index of transform- 
ation quality Transformations are fit. The 
RMSE will usually be less than the true 
transformation error at a randomly selected 
point, because we are actively minimizing 
the N and E residual errors when we statis 
cally fit the transformation equations. How- 
ever, the RMSE is an index of accuracy, and 
a lower RMSE generally indicates a more. 
accurate affine transformation, 

Estimating the coordinate transforma- 
tion parameters is often an iterative process. 
Typically, corel poa ме eee! be 

transformation parameters estimated, 
and the overall RMSE and individual point € 
and N errors evaluated. Suspect points are 
checked, blunders fixed, and the transforma- 
tion re-estimated and errors evaluated until а 
final transformation is estimated. 


Other coordinate transformations are 
sometimes used. A conformal coordinate 
‘transformation is similar to the affine, 
except that it requires equal scale changes in 
the X and Y directions. This results in a sys- 
tem of equations with only four unknown 
parameters, and so the conformal may be 
estimated when only two control points are 
available 

Higher-order polynomial transforma- 
tions are sometimes used to transform. 
among coordinate systems. An example of a 
2nd-order polynomial is: 


Here, the combined powers of the X and 
Y variables may be up to 2. This allows for 
curvature in the transformation in both tbe x 
and Y directions. A minimum of six control. 
points is required to fit this second-order 
polynomial transformation, and seven are 
required when using a statistical fit The esti- 
‘mated parameters Те. Ty, o1. 02, bj, and bz 
‘will be different in equations 4.1 and 42 
‘when compared to 4.7, even if the same set 
of control points із used for both statistical 
fits. We change the form of the equations by 


including the higher-order squared and XY 
cross-producttenns, and all estimated 
parameters will vary. 


A Caution When Evaluating 
‘Transformations 


Selecting the “best” coordinate transfor- 
mation to apply is a subjective process, 
‘guided by multiple goals. We hope to 
develop an accurate transformation based on 
a large set of well-dstributed control points. 
Isolated control points that substantially 
improve our coverage may also contribute. 
substantially to our transformation error. 

There are no clear rules on the number 
of points versus distribution of points trade- 
ой, but it is typically best to strive for the 
widest distribution of points. We want at 
least two control points in each quadrant of 
the working area, with a target of 20% in 
each quadrant. These goals are often not 
possible. The transformation equation. 
should be developed with the following 
Observations in mind. 


First, bad control points happen, but we 
should thoroughly justify the removal of any 
control point. Every attempt should be made 
то identify the source of the error, either in 
the collection or in the processing of field 
coordinates, the collection of image coordi- 
nates, or in some blunder in coordinate tran- 
scription. 

Second, a lower RMSE does not mean a 
better transformation. The RMSE is a useful 
tool when comparing among transformations 
that have the same model form, for example, 
‘when comparing one affine to another affine. 
The RMSE is not useful when comparing 
among different model forms. for example, 
‘when comparing an affine to a second-order 
polynomial. The RMSE із typically lower 
for higher-order polynomials than an affine 
transformation, but this does not mean the 
higher-order polynomial provides a better. 
fit. High-order polynomials allow more flex- 
ibility in warping the surface to fit the con- 
trol points. Unfortunately, this warping may 
significantly deform the non-control-point 


coordinates, and add large errors when the 
transformation is applied to all data in a 
layer (Figure 4-34). Thus, high-order poly- 
nomials and others should be used with cau- 
tion. 

Finally. independent tests make the best 
comparisons among transformations. A 
completely independent set of widely dis- 
tributed test points is ideal, but these rarely 
exist. We often use a “bootstrap” approach 
that successively removes points. One point 
is withheld, the transformation estimated. 
and the error at the withheld point calcu- 
lated. The point is replaced and the next 
point withheld. fitting the same type of 
transformation. The equations will be 
slightly different. The error at this second 
withheld point is then calculated This pro- 
cess is repeated for each control point, and a 
‘mean error calculated. 
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Control Point Sources 


Control points may be obtained from a 
variety of sources. Traditional ground sur- 
veys based on optical surface measurements 
area common, although decreasingly used 
method for determining control point loca- 
tions. Federal, state, county, and local gov- 
ermments all maintain a set of accurately 
surveyed locations (Figure 4-35), and these 
points may be used as control points or as 
Starting points for additional surveys. Many 
of these known points have been established 
using traditional surveying techniques. The 
ground survey network is often quite sparse 
and insufficient for registering many large- 
Scale maps or images. The global position- 
ing system (GPS), GLONASS. and Galileo 
are Global Navigation Satellite Systems. 
(GNSS) that allow us to establish control 
points. GNSS, discussed in detail in Chapter 
$, can help us obtain the coordinates of con- 
trol points that are visible on а map ог 
image, GNSS are particularly useful because 


3rd-order transformation, 
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we may quickly survey widely spaced 
points. GNSS positional accuracy depends 
‘on the technology and methods employed: it 
typically ranges from subcentimeter (tenths 
of inches) to a few meters (tens of feet). 
Most points recently added to the NGS and 
other government-maintained networks were 
measured using GNSS technologies. 


Existing maps are another common 
source of control points. Point locations are 
plotted and coordinates often printed on 
"aps; for example, the comer location coor- 
dinates are printed on USGS quadrangle 
‘maps. Road intersections and other well- 
defined locations are often represented on 
maps. If enough recognizable features can 
be identified, then control points may be 
obtained from the maps. 

Existing digital images and data layers 
may also provide control points. A short 
description of these digital data sources are 
provided here, and expanded descriptions of 
these and other digital data are provided in 
‘Chapter 7. Orthocorrected digital aerial pho- 
tos, road networks, digital raster graphics, 

ind many other digital data layers may pro- 
vide points that are accurately placed and 
‘unambiguously identifiable. 


Figure 435 Previous surveys are a common 
scree of control pois 


Мар Projection vs. Transforma- 
tion 


Map transformations should not be 
confused with map projections. A map 
transformation typically employs a statisti- 
cally fit linear equation to convert coordi- 
mates from one Cartesian coordinate system 
to another. A map projection, described in 
Chapter 3, differs from a transformation in 
that itis an analytical, formula-based con- 
version between coordinate systems, usu- 
ally from a curved, latimde longitude. 
coordinate system to a Cartesian coordinate. 
system. No statistical fitting process is used 
with a map projection. 


Map transformations should rarely be 
used in place of map projection equations 
‘when converting geographic data between 
map projections. Consider an example 
‘when data are delivered to an organization. 
in Universal Transverse Mercator (UTM) 
coordinates and are to be converted to State 
Plane coordinates prior to integration into a 
GIS database. Two paths may be chosen. 
The first involves projection from UTM to 
‘geographic coordinates (latitude and longi- 
mde), and then from these geographic 
Coordinates to the appropriate State Plane 
coordinates, This is the correct, most accu- 
rate approach. 

Analtemate and often less accurate 
approach involves using a transformation 
to convert between different map projec- 
tions. In this case, a set of control points 
‘would be identified and the coordinates 
determined in both UTM and State Plane. 
coordinate systems. The transformation 
coefficients would be estimated and these 
‘equations applied to ай data in the UTM 
data layer. This new output data layer 
‘would be in State Plane coordinates. This 
transformation process should be avoided, 
asa transformation may introduce addi- 
tional positional error. 


‘Summary 


Spatial data entry is a common activity 
for many GIS users. Although data may be 
derived from several sources, maps are a 
common source, and care must be taken to 
choose appropriate map types and to inter- 
pret the maps correcily when converting. 
them to spatial data ina GIS. 

Maps are used for spatial data entry 
due to several unique characteristics. These 
include our long history of hardcopy map 
production, so centuries of spatial informa- 
tion are stored there. In addition, maps are 
inexpensive, widely available, and easy to 
convert to digital forms, although the pro- 
cess is often time consuming, and may be 
costly, Maps are usually converted to digi- 
tal data through a manual digitizing pro- 
cess, whereby a human analyst traces and 
records the location of important features. 
Maps may also be digitized via a scanning 
device. 

The quality of data derived from a map 
depends on the type and size of the map, 
how the map was produced, the map scale, 
and the methods used for digitizing. Large- 
scale maps generally provide more accu- 
Tate positional data than comparable small- 
scale maps. Large-scale maps often have 
ies map generalization, and small horizon- 
tal errors in plotting. printing, and digitiz- 
ing are magnified less during conversion of 
large-scale maps. 

Snapping, smoothing, vertex thinning, 
and other tools may be used to improve the 
quality and utility of digitized data. These 
methods are used to ensure positional data 
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are captured efficiently and at the proper 
level of detail 

Maps and other data often need to be 
‘converted to a target coordinate system via 
а шар transformation, Transformations are 
different from map projections (discussed 
in Chapter 3), in that a transformation uses 
‘an empirical, least squares process to con- 
vert coordinates from one Cartesian system 
to another. Transformations are often used 
when registering digitized data to a known 
‘coordinate system. Map transformations 
should not be used when a map projection 
is called for 

Cartography is an important aspect of 
GIS, because we often communicate spa- 
tial information through maps. Map design. 
depends on both the target audience and 
purpose, setting and modes of map view- 
ing. and available resources. Proper map 
design considers the scale, symbols, labels, 
legend, and placement to effectively com- 
municate the desired information. 

Metadata are the "data about data." 
They describe the content, origin, form, 
‘coordinate system, spatial and attribute 
‘data characteristics, and other relevant 
{information about spatial data, Metadata 
facilitate the proper use, maintenance, and 
transfer of spatial data. Metadata standards 
have been developed, both nationally and 
internationally, with profiles used to cross- 
reference elements between metadata stan- 
dards. Metadata are a key component of 
spatial data, and many organizations do not 
consider data complete until metadata have 
been created. 
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Study Questions 


4.1- Which is the larger-scale map: 
a) :5,000, or 1:15,0002 
b) 1:5,286, or 1 inch toa mile? 
¢) 11,000,000, or 1 em to 1 km? 
4) 1:50,000, or 0.00025 
Әѕл.ог їл? 


4.2- Which is a larger-scale map: 
a) :20,000, or 11,000,000? 
Pb) 1 centimeter to 1,000 meters, or one yard to a mile? 
€) inch equals 1 mile, or 1:100,0002 
d) Lem to 1 km, or 1 inch toa mile? 
e)1 mm to 1 kın, or 11,500,000? 


4.3 - Describe three different types of generalization 


4.4 - Identify the kind of generalization at tbe labeled locations a through d in the 
‘map below, left, compared to the “truth” in the image. below right. Categorize the 
generalizations as fused, simplified, displaced, omitted, or exaggerated. 
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45 - Identify the kind of generalization at the labeled locations а through d in the 
‘map below, left, compared tothe “truth” inthe image, below right. Categorize the 
generalizations as fused, simplified, displaced, omitted, or exaggerated: or if it 
doesn't fit in one of these categories, then categorize it as "other." and describe the 
generalization, 


4.6 - What are the most common map media? Why? 


4.7 - Is media deformation more problematic with large-scale maps or small-scale 
paper maps? Why? 


48 - Which map typically shows more detail - a large-scale map or a small-scale 
‘map? Can you give three reasons why? 
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4.9 - Complete the following table that shows scale measurements and calculations. 


Ground distance | Corresponding map dis- 


and units tance and units MESE 

13,280 feet 64 inches 1: 24,900 

126.4 kilometers 25.28 centimeters 

123.6 miles 228 inches 

407 meters centimeters 1:5025 
kilometers 462 inches 1: 249685 


Ground distance |Correpsonding map dis- 


4.10 - Complete the following table that shows scale measurements and calculations. 


‘ond units tance and units Mae 
17.120 kilometers 1685 inches 1:40.000,935 
23.4 kilometers 117 centimeters 
164 miles 93 inches 
102.0 meters 1: 5,500 
10.24 inches 1: 2,000,000 


4.11 - What is snapping in the context of digitizing? What are undershoots and over- 
‘undesirable? 


shoots, and why are they 
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4.12- Identify a characteristic feature or error in digitizing at each of the labeled let- 
ter locations in the drawing below; for example, node, overshoot, missing label, ete.: 


4.13 - Identify a characteristic feature or error in digitizing at each of he labeled let- 
ter locations in the drawing below: for example, node, overshoot, missing label etc.: 
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4.14- Sketch the results of combined node (open circle), vertex (closed circle), and 
edge (lines) snapping with a snap tolerance of a) a distance of 5 units, and b) a dis- 


tance of 10 units, as shown by the snap circles. Note the radius and not the diameter 
of these circles defines the snapping distance. 


5 unit rods 


10 unt 
radus 


4.15 - Sketch the results of combined node (open circle), vertex (closed circle), and 
edge (lines) snapping with a snap tolerance of a) a distance of 5 units, and b) a dis- 


tance of 10 units. as shown by the snap circles. Note the radius and not the diameter 
of these circles defines the snapping distance. 


کی 
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4.16 - What are splines, and how are they used during digitizing? 


417- a) Why is line thinning sometimes necessary? 
b) Does increasing the width of the line thinning band tend to increase, decrease, 
‘or not affect the number of points removed? 
©) Does increasing the number of points initially spanned tend to increase, 
decrease, or not affect the number of points removed? 


4.18 - Contrast manual digitizing to the various forms of scan digitizing. What are the 
advantages and disadvantages of each? 


4.19 - What is the "common feature problem" when digitizing, and how might it be 
overcome? 

420- Describe the general goal and process of map registration. 

4.21 - What are control points, and where do they come from? 


4.22 - Define an affine transformation. including the form of the equation. Why is it 
called a linear transformation? 


4.23 - What is the root mean square error (RMSE), and how does it relate to a coordi- 
nate transformation? 


424. Is the average positional error likely to be larger, smaller, or about equal to the 
RMSE? Why? 


4.25 - Why are higher-order (polynomial) projections to be avoided under most cir 
cumstances? 


426- Which of the following transformations will likely have the smallest average 
error at a set of independent test points? 

a) affine, RMSE = 1423 b) affine, RMSE = 9.8 

c)2nd-order polynomial, RMSE = 9.7 d) 3rd-order polynomial, RMSE = 6.45 
427. Which of the following transformations will likely have the smallest average 
error at а set of independent test points? 

a) Ist-order polynomial, RMSE-53 b) affine, RMSE = 9.8 

c) 2nd-order polynomial, RMSE = 49 — d) Ist-order polynomial, RMSE = 9.9 


4.28 - Define and describe metadata. Why are metadata important? 
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5 Global Navigation Satellite Sys- 
tems and Coordinate Surveying 


Introduction 


Broadly defined, there are two common 
ways to obtain geographic coordinates. The 
fist uses remote data collection. primarily 
from aerial and satellite cameras. Coordinate 
Positions may be obtained o within a few 
centimeters (inches) from properly collected, 
carefully processed images. Digitizing from 
these sources is described in Chapter 4, and 
image systems and automated image 
extraction are described in Chapter 6 

We also commonly collect coordinates 
using GNSS GPS for field measurements, 
tools described in this chapter. We travel toa 
feature and physically occupy a location to 
measure unknown X. Y. and ofien Z coordi- 
nates, Measurement systems have become 
quite powerful, incorporating satellite and 
laser technologies, primarily Global Naviga- 
tion Satellite Systems (GNSS), as well as tra- 
ditional ground surveying methods. Field 
measurements may be accurate to within 
millimeters (tenths of inches). 

GNSS are satellite-based technologies 
that give precise positional informatico, day 
or night, in most weather and terrain condi- 
tions. GNSS technologies may help navigate 
and track any moving object that can сапу a 
receiver. Receivers shrink in size, weight, 
and power requirements each year (Figure $- 
D 

GPS, for Global Positioning System. is 
sometimes used synonymously for GNSS. 


prr 5:1: A GNSS md data colection pact 
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GPS more specifically refers toa U.S.-based 
satellite navigation system. the first devel- 
‘oped and deployed globally. 

Coordinate surveying is often used to 
‘complement GNSS measurements. Coordi- 
nate surveying uses optical and electronic 
angle and distance measurements, some 
already described in Chapter 3. Because both 
GNSS and coordinate surveying measure- 
‘ments are important, they will be covered in 
this chapter. 


GNSS Basics 
GNSS have had a pervasive impact in 


the geographic information sciences, and 
‘underly almost all modem spatial data 
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collection, ether directly during field sur- 
veys, or indirectly to locate and orient image 
data, 


As of 2022 there are four functioning 
GNSS systems, and two regional systems. 
‘The U.S. NAVSTAR Global Positioning 
System (GPS) was the first deployed and is 
‘the most widely used system. There is an 
‘operational Russian system named GLON- 
ASS with 24-hour coverage worldwide. The 
‘Chinese Compass, or BeiDou. is a third sat- 
elite navigation system and includes a con- 
stellation of 30 positioning satellites with 
global coverage. А fourth system, Galileo. 
has been developed by a consortium of 
European governments and industries, with 
a total of 30 satellites in the constellation. 
There is a regional system by the Indian gov- 
emment (IRNSS). with seven satellites giv- 
Ing coverage ost cetal Asis, 

‘operational in 2016. but semi-functional 
because of equipment failures. Another 
regional system, QZSS, covers Japan, east 
‘Asia, and the westem Pacific Ocean. In the 
following discussion we use GNSS аза 
generic term forall global systems, and use. 
GPS to refer specifically to the U.S. 
NAVSTAR system. 

‘There ме three main components, or 
segments, of any GNSS (Figure 5-2). The 
first is the satellite segment. This is a con- 
stellation of satellites orbiting the Earth and 
transmitting positioning signals. The second 
‘component of any GNSS is a control seg- 
ment. This includes tracking, communica- 
tions, data gathering, integration, analysis, 
and control facilities. The third part of 


ie" 


л 


user segment control segment 
Figure $2 The iee sees hat compre» 


GNSS is the user segment, the GNSS receiv- 
es. 


A GNSS receiver is a device that records 
data transmitted by each satellite, and then 
processes these data to obtain three-dimen- 
‘sional coordinates (Figure 5-3). There is a 
wide array of receivers and methods for 
determining position. Receivers are often 
handheld devices with screens and key- 
boards, or electronic components mounted 
‘on cars and trucks, planes, or other objects. 


The satellite and control segments differ 
foreach GNSS. The NAVSTAR GPS 
includes a constellation of satellites orbiting 
the Earth а an altitude of approximately 
20,000 km. Initial system design included 21 
active GPS satellites and three spares, dis- 
tributed among six offset orbital planes 
Every satellite orbits the Earth twice daily, 
and each satellite is usually above the flat 
horizon for eight or more hours each day. 
Experimental and successive operational sat- 
elites have outlasted their design life. so 
there have typically been more than 24 satel- 
lites in orbit simultaneously. Between four to 
eight active satellites are typically visible 
from any unobstructed viewing location on 
Earth 


GPS is controlled by a set of ground sta- 
tions. These are used to observe. maintain, 
and manage satellites communications, and 
related systems. There are five tracking sta- 
tions in the GPS system. spread across the 
planet. Data are gathered from a number of 
sources by the stations, including satellite 
health and status from each GPS satellite, 
tracking information from each tracking sta- 
tion, timing data from the U.S. Naval Obser- 
vatory and surface data from the U.S. 
Defense Mapping Agency. A Master Control 
Station synthesizes information and broad- 
casts navigation, timing, and other data to 
each satellite. The Master Control Station 
also signals each satellite as appropriate for 
course corrections, changes in operation, or 
other maintenance 


The GLONASS system is another cur- 
rently operating GNSS. GLONASS was ini 
tiated by the former Soviet Union in the 


early 1970s. Satellites were first aunched in 
the early 1980s, and the system became 
functional in the mid-1990s. The GLONASS 
system was designed for military navigation. 
targeting, and tracking. and is operated by 
the Russian Ministry of Defense, with con- 
‘trol and tracking stations similar to those for 
the NAVSTAR GPS system. 


GLONASS was designed to include 21 
active satellites and three spares. New 
designs have been phased in as older satel- 
lites have expired, and system managers 
have focused on maximizing coverage over 
Russia. The GLONASS system is estab- 
lished, witha published renovation and 
‘maintenance plan, such that commercial 
manufacturers have developed dual GPS 
GLONASS capable receivers 


The Chinese Compass (BeiDou) system. 
consists of 30 satellites with attendant 
ground station infrastructure. Eighteen satel- 
lites were launched in 2018, and 10 
upgraded satellites are scheduled for launch 
in 2019, The constellation includes both 
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geostationary and inclined orbiting satellites. 
The system is designed for both civilian and 
military use, with substantially augmented 
accuracies, up to a few centimeters within 
China, due to a network of local receivers 


‘The European Galileo system also 
implements satellite, control, and user seg- 
ments. There are 30 satellites in the com- 
plete Galileo constellation. Satellites are 
arranged in three orbital paths ata 54° 
‘orbital inclination, with a satellite altitude 
near 23,600 km above the Earth, This satel- 
lite constellation will provide better cover- 
age of high northern latitudes than the U.S. 
NAVSTAR GPS system, to better serve 
northern Europe. Galileo will be managed 
through two control centers in Europe and 
20 Galileo Sensor Stations spread through- 
‘out the world to monitor, communicate with, 
and relay information among satellites and 
the control centers, 


Figure 5-3: А handheld GNSS receiver (ей) and a GNSS receiver in se (right, cour- 
tesy Jumper Systems). 
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GNSS Signals 


GNSS positioning is based on radio sig- 
nals broadcast by each satelite, Systems 
чагу, but all transmit on multiple frequencies 
and also send data needed to calculate posi- 
tions from the main signals. As an example, 
the NAVSTAR GPS satellites broadcast 
positioning signals on three base frequencies 
(Table 5-1), the L1, L2, and L5. These car- 
rier signals are modulated to produce coded 
signals, ер. the C/A code at 1.023 MHz and 
the P and M code at 10.23 MHz. The LI sig- 
nal carries both the C/A and P codes. while 
the L2 carries the P and M codes (Figure 5- 
4). Additional signals provide information, 
and coded signals are sent containing satel- 
lite navigation and other information. These 
other signals have been added to 

function, for example, the CNAV for track- 
ing, forward error correction code, and an 
MNAV for enhanced military applications. 
‘The coded signals (C/A. P, and M) are some- 
times referred to as the pseudorandom code. 
‘because they appear quite similar to random 
noise. However, short segments of the code 
are unique for each satelite and time. A 
receiver decodes each signal to identify the 
satellite, transmission time, and satelite 
position at the time the signal was sent. The 
receiver combines this information from. 
multiple satellites for positioning. The coded 
signal does repeat, but the repeat interval is 
long enough to not cause problems in posi- 
tioning, 


Table 5-1 : GPS Signals 


Name Frequency (MHZ) 
Lue 157542 
ТТЛ 12276 
(act 
15 117645 
RM 1025 
CA 1025 


Frequency 
(MHz) 


1176 


1207 


1227 


comass] ca | 1246 
1268 
1278 


Compass вз 


Compass, 
Сойео 


Compass 1561 
Gps. 


Goteo, 1.575 


GLONASS| 1602 


Tigre 4: Exiting ed proposed GNSS bone 


тти 


ЖКны hey аар терен we 
‘ot spaced to seale (courery ESA) 


Positions based on carrier signal mea- 
surements (L1, L2, and LS frequencies for 
the NAVSTAR GPS). and positions based on 
multiple frequencies are inherently more 
accurate than those based on the coded sig- 
nal or single frequency measurements. The 
mathematics and physics of carrier measure- 
ment are better suited for precise position- 
ing."dual frequency" GNSS are more. 
accurate than single band receivers because. 
they aid removal of various errors, primarily 
ionospheric errors, described later. 


Improved accuracy in GNSS positioning 
usually incurs added costs in equipment and 
in time spent collecting and processing data. 
Carrier measurements require more sophisti- 
cated and expensive receivers and must 
record signals for longer periods of time than 
code receivers. If the signal is blocked by a 
building. mountain, or other object, Ше sig- 
nal may be lost momentarily and carrier 
phase measurements begun anew. This sub- 
stantially reduces the efficiency of carrier 
phase data collection, although these con- 
straints have decreased with modem receiv- 
ers. Newer systems are often capable of 
tracking multiple GNSS constellations with 
hundreds of channels, reducing loss of lock. 


GNSS satellite also broadcasts data on 
satellite status and location. Data streams go 
by various names, but using the GPS con- 
ventions, the information includes an alma- 
nac, data used to determine the satellite 
status, and ephemeris data of satellite tracks. 
These ephemerides allow a GNSS receiver. 
to accurately calculate the postion of the 
broadcasting satelite and the expected posi- 
tions of other satellites. Satellite health, 
clock corrections, and other data are also 
transmitted. 


The various GNSS systems span а simi- 
lar range of frequencies, and are organized 
50 that there is little interference between 
any two signals, even when they share the 
same fundamental frequency (Figure 5-4). 
GLONASS broadcasts G1 and G2 carriers 
similar to GPS, and an additional G1/G10 
and experimental G3 signal at higher and 
lower frequencies. BeiDou Compass and 
Galileo include the broadcasts of a range of 
signals on several fundamental carriers, 
including overlaps with the GPS, BeiDou' 
Compass, and GLONASS signals at various 
frequencies. When all GNSS constellations 
are fully operational there may be as many 
as 70 satellites, so mult-constellation 
receivers may provide robust, fast, accurate 
positioning. 
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Range Distances 

GNSS positioning is based primarily on 
range distances. A ranges a distance 
between two objects. For GNSS, the range is 
the distance between a satellite in space and 
a receiver (Figure 5-5). GNSS signals travel 
approximately at the speed of light. The 
range distance from the receiver to each sat 
elite is calculated based on signal travel 
time from the satellite to the receiver; 


Range = speed of ight * travel time — (51) 


Coded signals are used to calculate sig- 
mal travel time by matching sections of the 
‘code. Timing information is seat with the 
coded signal, allowing the GNSS receiver to 
calculate the precise transmission time for. 
‘each code fragment, The GNSS receiver also 
‘observes the reception time for each code 
fragment. The differences between transmis- 
sion and reception times are used to calcu- 
late range distances, often at rates as high as 
a new range calculation for any satellite each 
second (Figure 5-6). 

Carrier phase GNSS is also based on а 
set of range measurements. In contrast 10 
‘coded signals, the phase of the satellite sig- 
nal is measured. Each wave transmitted at a 
given frequency is identical, and at any 
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Figure SS A single satelite range measurement, 
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Figure 56: A decoded C/A tellite signal provides a ranpe measurement. 


given point in time there is some unknown 
integer number of waves plus a partial wave 
‘that fit in the distance between the satellite 
and the receiver, Continuous carrier signal 
‘observations allow the calculation of wave- 
length number over the measurement inter- 
val, and then the calculation of very precise. 
satellite ranges. 


Simultaneous range measurements from. 
multiple satellites help estimate a receiver s 
location. А range measurement to one satel- 
lite places the receivers somewhere оа a 
sphere (Figure 5-70), Range measurements 
from two satellites place the receiver on a 
circle formed by the intersection of two 
spheres (Figure S-7b), Range measurements 
from three satellites define three spheres that 
intersect at two points (Figure 5-72). A 
sequence of range measurements through 
time from three satellites will reveal that one 
‘of the intersecting points remains nearly sta- 
tionary. while the other point moves rapidly 
through space. The point moves because the 
size and relative geometry of the spheres 
change through time as the satellites change 
positions. Simultaneous measurements from 
four or more satellites (Figure 5-74) are usu- 
ally required to reduce receiver clock errors 
and to allow instantaneous position measure- 
‘ment with a moving receiver. Data collected 
from more than four satellites usually 
improves position accuracy. 


Positional Uncertainty. 


Errors in range measurements and 
uncertainties in satellite location introduce. 
mes шо GNSS-dewrnined postions (Fig 
ure 5-8). Range errors vary substantially 
even if range measurements are taken just а 
few seconds apart. Errors in the ephemeris 
data lead to erroneous estimates of the satel- 
lite position, causing location error. Clock, 
atmospheric, and ionospheric uncertainties 
add error to range measurements, resulting 
ina band of range uncertainty around the 
GNSS receiver position. 

Several methods help us improve accu- 
racy. the simplest of which is point aver 
ing on a stationary receiver. Most receivers 
may estimate a new position, or fix, every 
second. Averaging yields a cluster of indi- 
vidual fixes distributed about the mean loca- 
tion, removing high frequency errors and 
helping quantify positional uncertainty. 

Averaging doesn’t remove Long-term 
bias in calculated positions, and we can't 
average while moving. Alternative methods 
for reducing positional error rely on reduc- 
ing the several sources of range errors 


Sources of Range Error 
Ionospheric and atmospheric delays ate 
common sources of GNSS range error. 
Range calculations depend on the speed of 
light, a constant when light is passing 
through a uniform electromagnetic field and 
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ıa vacuum, but not constant in space and 

our atmosphere. The Earth is sur- 
rounded by a varying density of charged par- 
ticles in the , formed by incoming 
solan radiation, which яу» electrons fom 
elements in the upper atmosphere. Changes 
in the charged particle density may affect 
GNSS transmissions. 


Atmospheric density is significantly dif- 
ferent from that ofa vacuum. Density 


Range errors occur because the GNSS signal 
velocity changes as й passes through the ion- 
‘osphere and atmosphere; some systems 
allow satellite screening based on horizon 
angle, to reduce atmospheric path effects on 
accuracy (Figure 5-9). 

Errors can be reduced by receiver 
design, because ionospheric effects depend 

on frequency. Dual.frequency receivers col- 
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lect information on multiple GNSS signals 
simultaneously and use sophisticated models 
to remove most of the ionospheric errors. 
Dual-frequency receivers are dropping in 
cost rapidly. and may soon be widely afford- 
able. Atmospheric range errors are still 
largely present after dual-frequency analysis, 
зо these are best removed by differential cor- 
rection, described in the next section 
System operation and delays also adds 
to range error. Satellite tracking is imperfect, 
and timing and other signals are slowed 
during transmission through the entire sys- 
Tem. Atomic clocks on the satellite may be in 
error, although these are typically small. 
Many of these errors may be partially 
removed in rigorous analytical post-process- 
lag rarely applied due to the comply of 
the calculations and needed additional data. 
Differential correction, described later, also 
removes much of the systemic range error. 
Receivers also introduce errors into 
GNSS positions. Receiver clocks may con- 
таш biases. Signals may reflect off of objects 
prior to reaching the antenna. These 
reflected, нр) signals have a longer, 
erroneous range than direct GNSS signals. 
Multipath signals often have lower power, 
and so may be screened by setting a thresh- 
‘old signal-to-noise ratio, Signals with high 


noise relative to the mean signal strength are 
ignored. Multipath signals may also be 
screened by properly designed antennas, 
most commonly those with aground plane, a 
‘metal disk under the antena. Multipath sig- 
nals are most commonly a problem in urban 
settings that have an abundance of strong 
comer reflectors, such as the sides of build- 
ings and streets. This is often a large source 
of range error, particularly when collecting 
data without a specialized multi-path rejec- 
tion antenna. 


Satellite Geometry and Dilution 
of Precision 


The geometry of the GNSS satellite con- 
stellation is another factor that affects posi- 
tional accuracy. Range errors create an area 
of uncertainty perpendicular to the GNSS 
signal transmission direction. These areas of 
uncertainty may be visualized as a set of 
nested spheres, with the true position some- 
‘where within the volume defined by the 
intersection ofthese spheres (Figure 5-10). 
These areas of uncertainty intersect, and the 
smaller the intersection area, the more accu- 
sate the position fixes are likely to be. Sig- 
nals from widely spaced satellites are 
complementary because they result in а 
smaller area of uncertainty. Signals from sat- 
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elites in close proximity overlap over broad 
areas, resulting in large areas of positional 
uncertainty. Widespread satelite constella- 
tions provide more accurate GNSS position 
measurements. 

‘Satellite geometry is summarized in a 
number called the Dilution of Precision. or 
DOP (Figure 5-11). There are various kinds 
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‘of DOPs, including the Horizontal (HDOP), 
Vertical (VDOP). and Positional (PDOP) 

Dilution of Precision. The PDOP is most 

used and is the ratio of the volume of a tetra- 
hedron created by the four most widespread, 
‘observed satellites to the volume defined by 
the ideal tetrahedron. This ideal tetrahedron 
is formed by one satellite overhead and three 
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satellites spaced at 120-degree intervals 
around the horizon. This constellation is 
assigned a PDOP of one, and closer group- 
ings of satellites have higher PDOPs. Lower 
PDOPs are better. Most GNSS receivers 
review the almanac transmitted by the GNSS 
satellites and attempt to obtain measure- 
‘ments that include the constellation with the 
lowest PDOP. If this best constellation is not 
available, for example, some satellites are 
not visible, successively poorer constella- 
tions are tested until the best available con- 
stellation is found. The receivers typically 
provide a measurement of PDOP while data 
are collected, and а maximum PDOP thresh- 
‘old may be specified, above which data are 
not collected. 


Range errors and DOPs combine to 
affect GNSS position accuracies. There are 
‘many sources of range error, and these com- 
bine to form an overall range uncertainty for 
the measurement from each visible GNSS 
satellite. IF more precise coordinate locations 
are required, then the choices are to use 
‘equipment that makes more precise range 
measurements, and/or to collect data when 
DOPs are low. 


GNSS accuracies depend on the type of 
receiver. atmospheric and ionospheric condi- 
tions, the number of range measurements, 
the satellite constellation, and the algorithms. 
used for position determination (Figure $- 
12). Current C/A code receivers typically 
provide accuracies between 3 and 30 m fora. 
single fix. Errors larger than 100 m for a sin- 
gle fix occur occasionally. Accuracies may 
be improved substantially, to between 2 and 
15 m, when multiple fixes are averaged. The 
longer the data collection time, the greater 
the accuracy. Improvements come largely 
from reducing the impact of rarer, large 
errors, but average accuracies are rarely 
below 1 m when using a single C/A code 
receiver. 


Accuracies when using carrier phase, 
dual frequency, or pairing data across differ- 
went receivers are much higher, frequently to 
‘within afew centimeters. These accuracies 
come at the cost of longer data collection 
times, although clever analysis and system 


design have lowered this to a few to tens of 
minutes, rather than hours аз in the recent 
past Fast, inexpensive, dual-frequency 
chipsets are becoming available that may 
provide 20 cm (eight inch, real-time accu- 
racy. Another technique, differential correc- 
tion, is the most reliable means of obtaining 
30 cm (one foot) accuracy. When combined, 
multi-frequency and differential correction 
may yield accuracies ofa few centimeters 
‘with occupation times measured in minutes 
Differential correctionis described inthe 
following section 
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Differential Correction 


‘The previous sections have focused on 
‘GNSS position measurements collected with 
a single receiver. This operating mode is 
known as autonomous GNSS positioning. 
An alternative method, known as differential 
positioning, employs two or more receivers. 
Differential positioning measurements are 
used primarily to remove most of the range. 
errors and thus greatly improve the accuracy 
‘of GNSS positions (Figure 5-12). However, 
differential positioning is not always 
employed, because single receiver position- 
ing is accurate enough for many applica- 
tions, and differential positioning requires 
more time and/or greater expense, 


Differential GNSS positioning entails 
establishing а least one independent base 
station receiver at a known coordinate loca- 
tion (Figure 5-13). The true coordinate loca- 
tion of the base station is typically 
determined using high-accuracy surveying 
‘methods, for example, repeated astronomical 
“observations, highest-accuracy GNSS, or. 
precise ground surveys, as described in 
Chapter 3. 

Weuse the base station to estimate range 
‘measurement errors for each position fix. 
Remember that GNSS is based on a set of 


R 


Differential Positioning: 
simultaneous GNSS 
measurements at 
field-roving (unknown) 
‘and base (known) 
sites 


roving receiver чүт 


Figure 5-13: Differential GNSS portioning. 


range measurements, and these range mea- 
surements contain errors. Some of these 
errors are due to uncertainty in the measured 
travel times from the satellite to the receiver. 
These combined travel time errors, also 
known as timing errors, are ofien among the 
largest sources of positional uncertainty. 

In differential correction, we use the 
known base station position to estimate the 
timing errors and hence range errors, Each 
GNSS satellite broadcasts its position along 
‘with the ranging signal. The "true" distance 
from a given satellite to the base station can 
be calculated because the base station and 
‘satellite locations are known. Note the quali- 
fying quote marks around the true distance. 
We cannot exactly define where the satellite 
is, and the base station coordinates have 
some (usually small) level of uncertainty 
associated with them. If we are very careful 
‘about surveying the location of our base sta- 
Чоп, then the errors in the base-to-satellite 
‘measurement are almost always smaller than 
the range errors contained in our uncorrected 
timing measurement 

‘The difference between the true dis- 
‘tance and GNSS-measured distance is used 
то estimate the timing error for a given satel- 
lite at any given time. The timing errors 
‘change each second. so they should be mea- 
sured frequently 

‘Timing corrections may be applied to 
the range measurements collected by a rov- 
ing receiver (Figure 5-14). These roving 
receivers are used to measure GNSS posi- 
tions at field locations with unknown coordi- 
nates. The timing error and hence range. 
error. for each satellite observed at a field. 
location is assumed to be the same as the 
range error observed simultaneously at the 
base station. We adjust the timing of each. 
satellite measurement made by the rover, 
then re-calculate the rover's position. This 
adjustment usually reduces each range error 
‘and substantially improves each position fix 
Taken with the roving field receivers. 
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‘The timing errors change across the sur- 
face of the Earth, and this places a restriction 
on the use of differential GNSS correction. 
Our roving receivers must be "near" our 
base station for differential correction to 
work. A substantial portion of the range 
error is due to atmospheric and ionospheric 
interference with the GNSS signal. Foru- 
nately, these conditions often vary slowly 
‘with distance through the atmosphere, so 
errors in one location are likely о be similar. 
to errors ín a nearby location. Therefore, as 
long as the rover is relatively near the base 
station, within а few tens to hundreds of 
kilometers, we may expect differential cor- 
rection to improve our position measure- 
ments. 

Differential correction requires the 
paired receivers collect data from a similar 
Set of satellites. We cannot fix a timing error 
we do not measure, зо the (vo receivers 
must measure the same satellites at the same 
time. Any four satellites providing accept- 
able PDOPs will suffice, although more sat 
elites are better. 

Successful differential correction also 
requires near simultaneity in the base and 
rover measurements. Errors change rapidly 
through time. Ifthe base and rover measure- 


tenured rover poston. 


mesa coll шит rhan a faw tn of 
seconds apart, they do not correspond to the 
Same set of en and tus the lene a 
the base station cannot be used to correct the 
rover data, Many systems allow data collec- 
tion to be synchronized to a standard timing 
signal, thereby ensuring a good match when 
the error vectors are applied to correct the 
roving receiver data. 

Base station data and roving receiver 
диз must be combined for correction. Data. 
are often stored and then downloaded from. 
both receivers, and combined on a computer. 
Software provided by most GNSS system 
vendors is then used to compute and apply 
the differential corrections tothe position 
fixes. This is known as post-processed dif- 
ferential correction, as corrections are 
applied afier, or post, data collection (Figure 
5-15. top). 


Post-processed differer 
is appropriate for many proj 
tions may be digitized with a GNSS receiver 
mounted to the top of a vehicle. The vehicle 
is driven over the roads to be digitized, the 
rover data differential corrected, and then 
‘exported as а data layer suitable for GIS. 
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Post-processed differential positioning 
has one serious limitation. Because precise 
positions are not known when the rover is in 
the field, post-processing technologies are 
useless for precise navigation. A surveyor 
recovering buried or hidden property comers 
offen needs to navigate to within a few tens 
of centimeters of a position while in the 
field. so that monuments, stakes, or other 
markers may be recovered. When using 
post-processed differential GNSS, the field 
receiver is operating as an autonomous posi- 
tioning device, and accuracies of a few 
meters to tens of meters are expected. This is 
not acceptable for many navigation purposes 
because too much time will be spent search- 
ing for the final location. 
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Real-time Differential Positioning 
An alternative GNSS correction method, 
known as real-time differential correction, 
may be appropriate when precise navigation 
is required. Real-time differential correction 
requires some extra equipment, and there is 
some cost in slightly lower accuracy when 
compared to post-processed differential 
GNSS. However, the accuracy of real-time 
differential correction is substantially better 
than autonomous GNSS, and accurate loca- 
tions are determined while still in the feld. 


Real-time differential GNSS positioning. 
requires a communications link between 

base stations and the roving receiver (Figure 
5-15, bottom). Typically. the base station is 
connected to a radio transmitter and an 

antenna. FM radio links are often used due 
to their longer range and good transmission. 


base station 


bose station 


through vegetation and other obstacles, or 
cell phone networks are also often used. The 
base station collects a GNSS signal and cal- 
culates range distances. The error is calcu- 
lated for each range distance. The magnitude 
and direction of each error is passed to the. 
radio transmitter, along with information on 
the timing and satellite constellation used. 
‘This continuous stream of corrections is 
broadcast via the base station radio and 
antenna. 

Roving GNSS receivers are outfitted 
with a radio, cell phone, ог other communi- 
cation system, and any receiver within the 
broadcast range of the base station may 
receive the correction signal. The roving 
receiver is simultaneously recording GNSS 
data and calculating position fixes. Each 
position fix by the roving receiver is 
matched to the corresponding correction 
from the base station. The appropriate cor- 
rection is then applied to each fix and accu- 
Tate field locations are computed in real 
"ime. 

Real-time differential correction. 
requires a broadcasting base station; how- 
ever, every user is not required to establish a 
base station and complete communication 
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system. For example, the U.S. Coast Guard 
has established а set of GPS radio beacons 
in North America that broadcast а standard- 
ized correction signal (Figure 5-16). A com- 
patible GPS receiver near these beacons can. 
use the signal for differential correction. 
‘These GPS beacon receivers typically have 
эп additional antenna and electronics for 
processing the beacon signal. They form part 
ofa National Differential GPS system 
(ХООР) under development through a col- 
laboration of the U.S. Departments of Trans- 
poration, Homeland Security, and other. 
РОР will support navigation and posi- 
tioning ia assas diua fon th Cons Quas 
network, 


Real-Time Kinematic and Virtual 
Reference Stations 


The highest accuracy differential correc- 
tion is provided with dual-frequency, carrier 
phase positioning, often used as real-time 
‘nematic (RTK) GNSS. The amount of ion- 
‘ospheric delay is different for different fre- 
‘quencies, so by comparing signals, such as 
the GPS carriers L1 and L2. the ionospheric 
delays may be estimated and removed. 
While single-frequency positions collected 


Figure 5-16: The location of radio beacon stations (dots) in he central US. in April, 2015. Distances 
Бап he nearest beacon are showu ш vanoun shades of pay 
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radio transmitter, along with information on 
the timing and satellite constellation used. 
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matched to the corresponding correction 
from the base station. The appropriate cor- 
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Tate field locations are computed in real 
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ever, every user is not required to establish a 
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system. For example, the U.S. Coast Guard 
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эп additional antenna and electronics for 
processing the beacon signal. They form part 
ofa National Differential GPS system 
(ХООР) under development through a col- 
laboration of the U.S. Departments of Trans- 
poration, Homeland Security, and other. 
РОР will support navigation and posi- 
tioning ia assas diua fon th Cons Quas 
network, 


Real-Time Kinematic and Virtual 
Reference Stations 


The highest accuracy differential correc- 
tion is provided with dual-frequency, carrier 
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{for periods of less than an hour are typically 
in error by tens of centimeters (а half-foot) 
‘or more, dual-frequency GNSS are often 
accurate to а few centimeters (an inch) or 
better. 

RIK is such a powerful technology that 
‘many яме and national governments are 
establishing a dense constellation of dual- 
frequency receivers in a Virtual Reference 
Station network (VRS). Stations are spaced 
in a network over some region such that a 
roving receiver is never more than an accept- 
able distance from a base (Figure 5-17). The 
systems provide single- or dual-frequency 
base station data broadcast in а standard way 
over a given radio or cellphone signal, along 
‘with bese station information. A roving 
receivers may identify the closest ог best 
local receiver, and compare base signals to 
roving signals to obtain positions to within a 
few ceatimeters while in the field. 

Networks often distribute correction sig- 
nals using a Network Transport of RTCM via 
Internet Protocol (NTRIP). This requires 
less fidd equipment than radio-based alter- 
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natives because almost all mobile phones 
may connect to the Intemet via an NTRIP 
client application and to a GNSS receiver via 
bluetooth. This remove the need for a radio, 
getting а radio license as required for some 
jurisdictions/radio frequencies, the comma- 
nication range is generally longer, and there 
is less chance of radio interference. One 
does need a mobile phone, coverage, and 
often registration or credentials from the 
provider, but these are available in much of 
the world. Many GNSS receivers support 
differential correction through ап NTRIP 
protocol. 

‘There are disadvantages to RTK GNSS. 
The receivers are more expensive, although 
prices are dropping, The roving КТК. 
receiver must be closer to the base station for 
highest accuracies, typically within tens о? 
kilometers. This requires either a denser nst- 
work of base stations, or that the RTK users 
setup their own base station. Finally, as with 
all carrier phase positioning, satellite signals 
‘must be continuously tracked for longer 
periods, although modern receivers have 
reduced this time toa few to tens of minutes. 


WAAS, Augmentation, and Satel- 
lite-based Corrections 


There are alternatives to ground-based 
differential correction for improving the 
accuracy of GNSS observations, although 
these often involve higher costs, longer col- 
lection times, or lower accuracies. One free 
alternative, known as the Wide Area Aug- 
‘mentation System (WAAS). is administered 
by the U.S. Federal Aviation Administration 
to provide accurate, dependable aircraft mav- 
igation. The WAAS system is single fre- 
quency and so less accurate than the dual- 
frequency systems described above. 

WAAS uses a network of ground refer- 
ence stations spread across North Americato 
correct GPS signals. A generalized correc- 
tion for each station is broadcast from geo- 
stationary satellites, and applied in real time 
for improved accuracy in roving receivers 
Tests indicate individual errors are less than 
7m 95% of the time, and average errors les 


than 3 m, an improvement over uncorrected 
C/A code (Figure 5-12). WAAS is often 
‘unavailable at extreme northern latitudes. 
‘here equatorial geostationary satellites are 
often not visible. 


Precise Point Positioning 


Precise point positioning (PPP) is 
another alternative to differential correction. 
This technique uses precise satellite, clock. 
and orbit measurements to solve for point 
locations. These improved satellite ephe- 
merides (orbit data) are usually only avail- 
able а few hours to days after any GNSS 
observation. PPP has the advantage of centi- 
meter-level, worldwide positioning without 
need of a base station. Unfortunately, PPP 
requires complex calculations on long. unin- 
terrupted observations for highest accura- 
cies, ер. several to tens of hours of 
continuous GNSS observations. PPP also 
have a few hours to few days delay in pro- 
cessing because the actually satellite posi- 
tions must be re-estimated post-facto, There 
area number of free services that do PPP. 
calculations, e.g., the U.S NGS OPUS ser- 
vice or the CNRS-PPP run by the Canadian 
government, as well as privately-run ser- 
Vices, e.g. а free option ш Trimble's Center- 
point RIX. 

PPP is best used when long observation 
times are possible, few points are required 
there are relatively unobstructed sky views, 
and positions aren't needed within a day of 
collection. They are most appropriate for 
measuring fixed assets, e.g. when locating a 
fixed location for an RTK base station, ог for. 
monumented points from which laser or 
other measurements will be collected. PPP 
isn't appropriate for measuring most geo- 
graphic features, ог for most areas where. 
trees, terrain, or buildings obstruct much of 
the sky. 

Integrated systems of satellite observa- 
tions and communications are available to 
provide PPP-like accuracies in near real 
time, without RTK connection to а second 
ground-based station, and with shorter 
‘observation periods the OPUS or similar 
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systems. Some of these adopt RTK-like 
techniques. Examples include Trimble Cen- 
terPoint RIX, NavCom StaFire, and Veri- 
роз TerraStar. Typically, these systems are 
Sold as a subscription service, in which a 
monthly or annual fe is paid to access the 
real-time improved satellite positioning and 
other data. These data are often broadcast 
viaa cellular modem or satellite radio. 


А Caution on Datums 


Errors may easily be injected into GNSS 
data due to improper datum transformations. 
One must be cautious in using GNSS data, 
either directly, or after applying a differential 
correction, because the datum transforma- 
tion used is often not transparent, and is 
‘often poorly documented. The U.S. 
NAVSTAR GPS system provides a good 
example of the confusion that may occur. 


GPS satellite locations are reported in 
the most current ITRFAVGSSA datum, 
although they are often converted to other 
datums in software before being displayed to 
the user. Practices may change in the U.S. 
withthe adoption of the NATRF2022, but 
now the GPS-measured ITRF/WGS84 
‘datums are quite different from the datums 
used with most GIS data in the United 
States, for example, NADS3(2011). We must 
transform data collected in the WGS84 
datum to the NADS3(2011). 

As noted earlier, ignoring or selecting 
the wrong datum transformation will intro- 
duce error into our newly-collected data. 
GNSS vendors typically provide an option to 
report data in a commonly used coordinate 
systems and specific datum: for example. the 
user may set the GNSS receiver to display 
UTM or State Plane coordinates in the 
NADS3(2011) datum. However, ће GNSS 
softwares sometimes do not clearly identify 
the datum transformation used. As noted in 
Chapter 3, early versions of the NADS3 
datum that underlie the UTM system were 
and remain up to 2 m (6 feet different from 
the WGSS4 datum, so you cannot assume 
they are the same, as is common practice. 
‘You must carefully choose the correct datum. 
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transformation in all conversions or you'll 
degrade your data. 

Confusion may be introduced during. 
differential correction. Here, base station 
‘coordinates define corrections relative to a 
point in a defined coordinate system. These 
Coordinates may be based on a datum differ- 
еш from the WGS84 or ITRF datum used for 
GPS data collection. Appropriate transfor- 
mations between datums must be applied to 
‘maintain accuracy, For example, the CORS 
network of GPS stations is а common source. 
of base data for differential corrections. The 
Coordinates for these base stations were fora. 
period typically reported in the most recent 
CORS realization of the NADS3 datum, but 
‘now most provide coordinates in an ITRF 
datum. An appropriate datum transformation 
must be applied when using these as a base 
for correction if the highest accuracies are to 
be maintained. As noted, the differences 
between the later NADS3(CORS) datums 
and most recent realizations of WGS84 are 
typically less than a meter, so introduced 
‘errors may be small relative to tbe accuracy 
required for some intended analyses. How- 
‘ever, many projects require submeter accu- 
тасу, and for some conditions the errors may 
be quite large, to tens or hundreds of meters, 
depending on the datums and projections 
involved. These errors may be avoided at lit- 
Че cost with the application of appropriate 
knowledge ofthe main datum families, 
information on which datums are used, and 
information on how to set the software to 
specify the correct transformation to the col- 
lected data. This information is typically 
provided in the vendor's documentation. 
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GNSS Applications 


Tracking, navigation. field digitizing. 
and surveying are the main applications of 
GNSS. Navigation is finding a way or route, 
and tracking involves noting the location оГ 
objects through time. A common example is 
tracking delivery vehicles in near real ume. 
Large delivery companies would like to 
know where their vans are at ай times. Vehi- 
cles equipped with a GNSS receiver and a 
radio or phone link may report back to a dis- 
patch office every few seconds. Icons on a 
digital map are used to represent vehicles, 
and a quick glance can reveal which vehicle 
is nearest a delivery or retrieval site, or 
which driver overly frequents a donut shop. 

Navigation is a second common GNSS 
application. GNSS receivers have been 
developed specifically for navigation. with 
digital maps or compasses set into on-screen 
displays (Figure 5-18). These GNSS receiv- 
ers and digital maps are extremely special- 
ized GIS systems, These systems are useful. 
‘when collecting or verifying spatial data, 
such as to navigate to the approximate vicin- 
ity of field measurement plots. 


Field Digitiziting 


Field digitizing is a primary application 
of GNSS in GIS. Data may be recorded 
directly in the field to update point, line, or 
area locations. Features are visited or tra- 
versed in the field, and an appropriate num- 
ber of GNSS fixes collected. GNSS 
receivers have been carried in automobiles, 
‘on boats, bicycles, and helmets, or by hand 
то capture the coordinate locations of points 
and boundaries (Figure 5-19). 

GNSS data are often more accurate than 
диз collected from the highest-quality digi- 
tal images. For example, RTK GNSS data 
typically have accuracies better than 5 cm, 
and often below 2 cm, while accuracies are 
often near 15 to 50 cm for the highest-reso- 
lution satellite images, and for national 
aerial image programs. Drones often provide 
images with centimeter-level resolutions, but 
accuracies are generally inthe 30 cm range 
‘or poorer. Precise differential correction of 
carrier phase GNSS data often yield centi 
meter-level accuracies, far better than can be 
obtained from digitizing almost all images. 

GNSS is often used to directly digitize 
new control points. Remember that control 
points are used o correct and transform. 
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image data or maps to real-world coordi- 
nates. Aerial images may be available, and 
the coordinates may be unknown for features 
ible on the aerial image. Control points 
‘may be difficult or impossible to obtain 
directly from surveys or from the informa- 
tion plotted on existing maps, particularly 
‘when grticule ог gridlines are absent. 
GNSS offers а direct method for measuring 
the coordinates for potential control points 
represented on the image ог map. Road 
intersections or other points may be identi- 
fied and then visited with a GNSS receiver. 


GNSS-measured control points are the 
basis for almost all current projects that per- 
form analytical correction of aerial imagery 
(see Chapter 6). Most image data are not ini- 
tially ina map coordinate system, yet images 
are often particularly useful for developing 
or updating spatial data. Aerial photographs 
contain detailed information. However, 
aerial photographs are subject to geometric 
distortion. These errors may be analytically 
corrected through suitable methods (see 
‘Chapter 6), but these methods require sev- 
eral control points per image, or at least per 
project, when multiple, overlapping aerial 
photographs are used. GNSS significantly 
reduces the cost of control point collection, 
thereby making single- or multiphoto correc- 
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tion а viable alternative for most organiza- 
tions that collect spatial data. 


Feature digitizing with GNSS often. 
involves the capture of both coordinate and 
attribute data. Typically. the GNSS receiver 
is activated and detects signals from a set of 
satellites. A file is opened and position fixes 
are logged at a set rate, such as every two 
seconds, Attribute data may also be entered, 
either while the position fixes are being col- 
lected, or before or after positional data col- 
lection. In some software, the position fixes. 
may be tagged ог identified. For example, 
specific comer may be tagged while digit 
ing a line. Multiple features may be col- 
lected in one file and the identities 
‘maintained via attached attributes, Data are 
processed as needed to improve accur 
and converted 10 a format compatible with 
the GIS system in use. GNSS data collection 
and data reduction tools often provide the 
ability to edit, split or aggregate collected 
data, for example, converting multiple fixes 
into a single point average. These functions 
may be applied for all position fixes in a file, 
or fora subset of position fixes embedded in 
a GNSS fie. 

Large. field-portable displays and 
advanced editing software may be combined 
with real-time differential correction to 
improve field digitizing. Tablet computers 
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are available with large color screens (Figure 

20). Scanned digital images may be dis- 
played with existing digital data, New data 
may be input via a GNSS receiver, in real 
time, or via penstrokes on the screen. The 
‘operator may digitize new features, edit old 
ones, or perform some combination of the 
two while in the field (Figure 5-21). Snap- 
ping tolerances, maximum overshoots, and 
all other digitizing controls may be applied 
in the field, much like when digitizing on- 
sereen in an office, 


GNSS field digitizing is most com- 
monly used for collection of point and line 
features. Multiple position fixes provide 
higher accuracies and are often collected for 
point locations, and for important vertices in 
line data. However, GNSS data collection 
for line and area features suffer from а num- 
ber of unique difficulties. First, it takes con- 
siderable time to traverse an area, so 
relatively large parcels or many small par- 
cels may be impractical to digitize in the 
field. Second, multiple representations of the 
same boundary may occur when digitizing 
polygonal features. Attempting to retrace the 
common boundary wastes time and provides 
redundant and conflicting data while field 
digitizing. The alternative is to digitize only 
the new lines, and snap to “field-nodes,” 
much as when capturing data using a coordi- 


Figure $21: Features may be entered and cite а the Seld uung a GPS receiver and appropriate software 
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nate digitizer (see Chapter 4). This method is 
often used. with subsequent editing in a 
desktop GIS. 

GNSS field software are often opti 
mized to streamline the input of attributes 
that are associated with spatial data. Forms 
‘may provide menus, pick lists, and variable 
‘entry boxes ina predetermined order. These 
Software often improve attribute data accu- 
тасу, in рап by helping avoid blunders. For 
‘example, the entry options for a specific 
atribute such as fire hydrant color may be 
restricted to red, green, ог yellow from a list, 
if those are the only possible values. These 
attribute entry forms also increase complete- 
ness, in рап by ensuring that every variable 
is presented to the operator. and these forms. 
шау also be configured to show a warning. 
‘when all variables have not been entered. 
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Single-Fix vs Averaged Accuracy 


Lines or polygons digitized with GNSS 
receivers usually have larger errors than 
those reported in the technical specifications 
or marketing literature. Figure 5-22 illus- 
trates this. A stream network was digitized 
with a professional grade GNSS receiver, 
with an advertised 50 cm (20 inch) "aver- 
age" error. The unit was configured to maxi- 
mize single-fix accuracy. Multiple passes up 
and down the streams were digitized, at nor- 
mal walking speeds, The digitized tracks 
(thinner, jagged lines) vary notably about the 
true stream location (thick. smoother lines) 
with errors often in the 3 o 5 meter range 
(10 to 16 feet), and as large as 25 meters (82 
ем). These observed errors are much larger 
than the reported average values for at least 
two reasons, 


First, manufacturers usually report an 
average or distributional accuracy, with 
some description ofthe statistics used. Often 
they use terms like CEP, the circular error 
probability. or one-sigma (10) errors, mea- 
sures of how frequently you'd expect to get 
an error larger than the reported size. These 
averages give you an optimistic view of sin- 
gle-fix accuracy, because averaging reduces 
variability, and so the mean error tends 
toward a stable, lower value. Errors to the 
north cancel out errors to the south, and soa 
mean of even a few fixes is much better 
behaved than a single fix. One-sigma or CEP 
measures are а bit more helpful, but again. 
they usually don’t tell us much about the fre- 
quency of lange errors. Unfortunately, we 
typically digitize single fixes when collect- 
ing lines or polygon boundaries, so the dif- 
ferences between single fix and average 
errors are important 

Second, accuracies are often specified 
under ideal conditions, in unobstructed loca- 
tions with no trees, buildings, or mountains 
to block or reflect satellite signals. While 
these conditions reign for much ofthe world, 
for many places they do not, decreasing sin- 
gle-fix accuracy. The data in Figure 
were collected under dense forest, so they 
show substantial scatter due to lower PDOPs 
and higher multi-path common in obstructed 
environments. As described in the next sec- 
tion, we typically use several strategies 
during field data collection to increase 
GNSS accuracy and reduce the need for 
post-collection editing. But even employing 
these, we often must manually edit line data 
or polygon boundaries when they are field 
digitized, to remove the occasional large 
eror, 
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Field Digitizing Accuracy and 
Efficiency 

Field GNSS collections are affected by 
two things we can control, and one thing we 
‘cannot. We can contro the quality of the 
‘equipment we use, and how we use it. We 
have very litle control over objects that 
interfere with satellite signals in the field. 
Terrain, trees, buildings, or other objects 
каа portions of the sky, causing tempo- 

in satellite reception, or 

forcing the GNSS receiver to estimate posi- 
tion from a constantly changing set of satel- 
lites. Maximum collection rates while 
digitizing in the field with a GNSS receiver 
are typically near one fix per second. 
Obstructions may increase collection inter- 
vals between fixes to several seconds or 
minutes. 


‘Obstructions may halt GNSS field digi- 
izing entirely if they reduce the number of 

‘satellites to three or fewer. Sky 
obstructions reduce the efficiency of field 
digitizing because more time is spent col- 
lecig a given number of fixes, and person- 
nel must wait for the satellite constellation to 
‘change when satellites are too few or poorly 
distributed. Alternately, they may collect 
fewer positions, thereby reducing positional 
accuracy. 

Reductions in the efficiency of GNSS 
digitizing depend on the nature of the 
‘obstruction, the type of equipment, the 
‘equipment configuration, and satellite mum- 
ber and position. GNSS signals may pass 
through foliage when collecting data below a 
forest canopy, although signals become 
weaker as they pass through several canopy 
layers. Satellite signals are blocked by stems 
and branches, though individual satellites 
are typically obstructed by stems for rel 
tively short durations. Under dense canopy, 
the available satellite constellation may 
‘change frequently: slightly changing the 
position of the GNSS antenna, by raising ог 
lowering it, may result in a new constellation 
of visible satellites. Despite these efforts, 
efficiency reductions may be substantial. 
doubling or tripling collection times, but sin- 
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щйе-йх collection times rarely take longer 
than afew seconds to minutes when using 
‘modern, professional-grade receivers and 
‘when forest canopy is the primary sky 
obstruction. Collection times will increase 
correspondingly when multiple fixes are 
required per feature. 

‘Terrain may block satellites, a signifi- 
cant problem when the blocked satellites are 
greater than 15° above the local horizontal 

lane. Satellites less than 15" above the local 

orizontal plane are of limited use, even in 
‘open conditions, because they exhibit large 
range errors due to atmospheric interference. 
GNSS receivers designed for GIS data col- 
lection typically provide setings that auto- 
‘matically reject satellites below a specified 
horizon angle. 

Terrain obstructions often rise above 
15°, such as when mountains, hillslopes, 
buildings, or canyon walls reduce the num- 
ber of visible GNSS satellites (Figure 5-23). 
Terrain obstruction often reduces collection 
efficiencies and accuracies. Because the 
GNSS signals do not pass through soil. rock, 
‘wood, or concrete, any obstructed satellite 
cannot be used for GNSS positioning. In 
зоте instances, a short wait may result in a 


rearrangement of the satellite constellation 
such as from point c to point b in Figure - 
23. However, on average, an obstructed sky 
results in a reduced constellation of GNSS 
satellites and higher PDOPs when compared 
to flat terrain. This problem is particularly 
vexing in urban settings because the horizon 
angles change substantially over shor dis- 
tances, This makes it difficult to predict 

when GNSS satellite coverage will be ade- 
quate, and thus plan data collection efforts. 


Forest terrain, and building effects may 
occur together, further reducing accuracies 
and decreasing efficiencies. This is a com- 
‘mon occurrence in forested, mountainous 
terrain, and in urban areas with both tall 
buildings and mature tres. 

We can use equipment to improve 
GNSS accuracy and efficiency. Dual fre- 
quency receivers, describe earlier, are more 
seca бош single-frequency receivers, 

5o improve positioning. RIK or 
Soie: mol dered сеа 
greatly improves accuracy. So does the use 
of an antenna with a ground plane to elimi- 

multi-path signals, аз described earlier. 
Noseof bie Mai foes vi agro 
GNSS collection efficiency, that із, the num- 
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Figure 5-14: A range poe in we with a GNSS 


ber of positions collected per unit time, but 
they may increase overall data collection 
efficiency, as less time is spent editing data 
postcollection. 

The use of a range pole is perhaps the 
easiest, most common, and often most effec- 
tive method to improve collection efficiency 
(Figure 5-24). A range pole is an extendable 
pole on which а GNSS antenna is mounted. 
A range pole is often particularly effective in 
urban and forested conditions. where canopy 
gaps and building obstructions vary verti- 
cally. A range pole facilitates the search for 
an acceptable set of satellites. The antenna is 
raised and lowered during data collection as 
the satelite constellation changes through 
time and long pauses are encountered. А 
range pole is perhaps most useful when digi- 
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tizing point features or important vertices in 
line features, when the receiver remains sta- 
tionary, 


Handheld or backpack-mounted poles 
commonly improve efficiency when digiti 
zating with GNSS. Raising the antenna justa. 
few meters off the ground avoids low 
‘obstructions such as the body and thick skull 
of the human operator. 

There is often a trade-off between accu- 
racy and efficiency during field digitizing, 
particularly in obstructed locations (Figure 
5-25). This series of graphs shows data col- 
lected by Serinzi and Floris (1998). in rough 
terrain and under forest canopy. These 
results are dated and so the absolute numbers 
are pessimistic for newer equipment, but the 
general patterns still bold true. As shown in 
the leftmost graph of Figure 5-25, they 
found that 100% of the possible fixes may be 
collected when the average visible horizon 
angle is near 15°. They also collected data at 
various points in hilly terrain, where the 
horizon angle was greater because moun- 
tains and ridges block lower portions of the 
sky. Efficiency dropped to near 70% as aver- 
age visible horizon angle increased to near 
30°. Collections took about 30% longer ог 
fixes were 30% less frequent when in valley 
locations compared to flat terrain. The center 
and right panels show the same trends, with 
increasing thresholds on improved satellite 
arrangements. 

While Figure 5-25 was generated with a 
specific, for that time. high-quality receiver 
that was optimized for field digitizing, the 
general pattems are true for all currently 
available GNSS systems — efficiency 
‘decreases in obstructed terrain, and the rate 
of decrease changes with the allowable 
PDOP. As GNSS receivers have improved, 
‘and can measure GPS, GLONASS, Galileo, 
and Compass GNSS simultaneously, effi- 
Ciencies and accuracies in obstructed envi 
ronments have substantially increased, such 
that reasonable efficiencies аге currently 
obtained under most conditions. 


We may improve the efficiency of 
GNSS digitizing by altering PDOP and sig- 
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nal strength thresholds, but this often comes 
atthe expense of decreased accuracy. 
Sophisticated receivers allow multiple set- 
ings: a target PDOP, above which the 
receiver will search for better satellite con- 
stellations, and a maximum PDOP, above 
‘which data collection will cease. This allows 
the user to balance the trade-off between 
accuracy and efficiency. 

Some GNSS receivers allow adjust- 
ments in the threshold for acceptable signal 
strength. For example, satellite signals that 
pass through a forest canopy are weaker. 


Including these weaker signals improves the 
sube tad oben te dieu of oh 
lites, thereby increasing collection efficiency 
and perhaps accuracy. However, weak sig- 
nals may also result from reflected or mul- 
tipath transmissions. As described earlier. 
multipath signals have larger range errors. 
Lowering the threshold for acceptable signal 
strength is likely to increase positional error 

increases the likelihood of multipath 
measurements. However, some data are 
often better than none, and lowering the 
PDOP threshold for collection is sometimes 
the only way to collect data. 

GNSS receivers specifically designed 
Tor GIS data collection may use multiple 
techniques to improve accuracy and effi- 
ciency. Manufacturers have invested sub- 
stantially in optimizing antenna design and 


collection systems to control these multiple 
trade-off, 


‘The availablity of multi-path rejection 
antennas isa primary difference between 
GIS-grade receivers and recreational receiv- 
ers costing much less. Recreational receivers 
are substantially less accurate in obstructed 
terrain because they often place priority on 
collecting a fix, accepting some degradation 
in accuracy. Ifyou are lost in the woods you 
‘would rather know where you are within а 
few meters than wait for a higher-accuracy 
signal or constellation. Recreational receiver. 
thresholds for signal strength ог PDOP are 
often configured for highest efficiency and 
"hus lowest accuracy under obstructed con- 
ditions, and these thresholds often may not 
be adjusted by the user. Irrespective of the 
equipment, there is a trade-off between the 
acceptable signal strength and the introduc- 
tion of multipath errors. Setting the maxi- 
mum acceptable PDOP higher or acceptable 
signal strength thresholds lower will 
increase efficiency of collection, but often at 
the cost of increased error. 


Rangefinder Integration 


There are other limits to GNSS data col- 
lection. For example, the need to occupy 
every vertex and node in the field is a pri- 
mary drawback of GNSS digitizing. Some- 
times, it may be dangerous to physically 
place the GNSS receiver over each point, for 
example, when a stream to be digitized is in 
a field full of rutting buffalo, Features may 
be difficult to reach, costing the user more 
time in travel than in GNSS data collection. 
This is particularly common when point fea- 
tures to be digitized are widely dispersed. 
Features may be numerous, intervisible, but 
separated by a barrier, for example, a 


Figure $26. A laser rangefinder may substan- 
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sequence of fence posts or power poles on 
opposite sides of a limited access highway. 

Peripheral measuring devices, such as 
laser rangefinders, may be attached to GNSS 
data collectors to substantially improve field 
data collection (Figure 5-26). These devices 
typically measure distance with a laser and 
direction with a compass. Measurements are 
made from each occupied GNSS point to the 
nearby features of interest. The target coor- 
dinate calculations are often automatic 
because direction is measured with an inte- 
grated electronic compass. The rangefinder 
is pointed at the feature to be digitized. The 
System calculates the observer's position 
from the GNSS, and this position is com- 
bined with distance and angle measurements 
in coordinate geometry to calculate the fea- 
ture coordinates, The person operating the 
GNSSilaser rangefinder may stand in one 
location and collect positions for several to 
tens of features, thereby saving substantial 
travel time. These systems are most often 
used to inventory point features such as ut 
ity poles, signs, wells, tees, or buildings, 

Laser rangefinders are available that сап 
measure features at distances up to 600 m 
(2000 б). Realized accuracies depend on 
both the quality ofthe GNSS receiver and 
the distance measuring subsystem. How 
submeter accuracies are possible under open 
sky conditions. 
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GNSS Height Measurement 


GNSS are often the easiest way to mea- 
зше new heights, but care must be taken to 
ensure heights are relative to an appropriate 
vertical datum. GNSS heights are typically 
determined as a height above an ellipsoid, or 
HAE, and many less expensive GPS receiv- 
ers lave no options to report any other 
height. The user should carefully read the 
documentation to determine the type of 
height reported by the GNSS unit. Typically. 
itis ellipsoidal height although some units 
provide an estimate of orthometric height. 
Since the two can differ by up to 100 meters 
(330 feet), the user should determine which 
height is reported. 

‘As described in Chapter 3, our standard 
height reference is a vertical datum and not. 
an ellipsoid, so we must convert any pro- 
‘vided HAES to an orthometric height, rela- 
tive to a datum. prior to most uses. AS 


explained in Chapter 3, we calculate the 
orthometric height via the equation: 


H= h-N 62 


The GNSS often provides h. the ellip- 
soidal height, and we may use spatial models 
developed by most governments to estimate 
N, the geoidal height, for any location. In the 
1055. these models have been developed and 
documented by the National Geodetic Sur- 
vey, and are available at a geoids page (hitp:/ 
/rwwngs.noaa.govGEOID/). These models 
have been incorporated into most GNSS 
receivers designed for GIS data collection, 
and so the conversion may be transparent, ог 
available as a system setting, 

The calculated height accuracy can be 
‘no more accurate than the geoid model or the 
GNSS measurements. For much of the world 
there is a relatively sparse set of geoidal 
height measurements, and geoid models are 
accurate to within а few to tens of centime- 
ters (inches o a few feet). Any error in the. 
geoid estimation translates directly to an 
error in estimated height. 


orthometric height = ellipsoidal height - geoidal height 


H=h-N 
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Ifthe receiver does not support geoidal 
height estimation, and if the user has no. 
access to computing or software required for. 
а geoid model, one may estimate geoidal 
height by referencing a nearby control point, 
for example, an NGS control data sheet in 
the United States. These sheets usually pro- 
vide geoidal heights for listed points. For 
many regions of the world, geoidal heights 
do not vary rapidly across space, and the 
spatial variability of geoidal heights may be 
estimated by retrieving heights for several 
nearby height control points. The nearest, an 
average, or some similar combination of 
geoidal heights may be used in equation 
(52) to calculate onthometric height, given a 
measurement of orthometric height from the 
GNSS, Variation in geoid height among the 
nearby stations should give an estimate of 
the variability in local geoid heights, and the 
added local uncertainty in estimating ortho- 
metric height using equation 52. 

Inaddition to geoidal height uncertainty. 
vertical GNSS height measurements are not 
as accurate as horizontal measurement, and 
зо require greater care, longer acquisition. 
times, and more stringent processing for a 
given level of accuracy. GNSS signals are 
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not constrained as much inthe vertical direc- 
tion, because satellites are not visible on the 
‘opposite side ofthe Earth. Range uncertainty 
is not constrained for the near- vertical satel- 
lites, and so vertical measurement errors are 
larger. fusing GPS for precise height mea- 
surements, occupation times must be longer. 
to obtain a height accuracy similar to hori- 
zontal positional accuracy. 


GNSS Tracking 


GNSS tracking of people, vehicles, 
packages, or animals is an innovative and 
growing ‘of GNSS, GNSS receiv- 
‘ers are routinely placed on trucks, ships, 
buses, boats, or other transport vehicles, 
These receivers are often part of systems that 
include information on local conditions, 
speed of travel, and perhaps the condition of 
the shipped equipment or cargo. 

GNSS is also increasingly applied to 
track individual organisms. This is revolu- 
tionizing animal movement analysis because 
ofthe frequency and density of points that 
тау be collected (Figure 5-28). More posi- 
tion fixes can be collected in a month using 
‘GNSS equipment than may be collected in a 
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decade using alternative methods (Figure 5- 
ЕЛ 

Animal movement analysis has long. 
‘been based on observation of recognizable. 
individuals, Each time a known animal is 
seen, the location is noted. The number of 
position fixes is often low, however, because. 
Some animals are difficult to spot, elusive, or 
live in areas of dense vegetation or varied 
terrain, Early alternatives to direct human 
observations were based primarily on radio- 
telemetry. Radiotelemetry involves the use 
ofa transmitting and receiving radio unit to 
determine animal location. А transmiting 
radio Б attached to an animal, and a techni- 
cian in the field uses а radio receiver to 
determine the position of the animal. Mea- 
surements from several directions are com- 
bined, and the approximate location of the 
animal may be plotted. 

GNSS animal tracking is a substantial 
improvement over previous methods. GNSS 
‘units are fit to animals, usually by a harness 
‘or collars (Figure 5.28), The animals are 
released, and positional information 
recorded by the GNSS receiver. Logging 
intervals are variable, from every few min- 


Figure 529: GNSS 
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utes to every few days, and data may be peri- 
odically downloaded via a radio link. 
Systems may be set up with an automatic or 
raio-sctivated drop mechanism so that data 
may be downloaded and the receiver re- 
used. While only recently developed, GNSS- 
based animal tracking units are currently in 
use on all continents ш the study of threat- 
ened, endangered, or important species. 
GNSS tracking for individual or lets 
of vehicles typically involves a number of 
subsystems (Figure 5-30). GNSS receivers 
and radio transmitters must be placed on 
each vehicle to record and transmit position. 
Satellite or ground-based receiver networks 
collect and transmit positional and other data 
to a computer running a tracking and man- 
agement program that may be used to dis- 
play, analyze, and control vehicle 
‘movement, Information or instructions may 
be passed back to vehicles on the road. 
GNSS-aided vehicle management may 
be combined with other spatial data in a GIS 
framework to add immense value to spatial 
analyses. Vehicle location can be monitored 
in real time, and compared to delivery loca- 
tions. Delivery planning may be optimized 


data for wildebeest ш the Serengeti and Ngorongoro Сена repons of Tan- 
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and delivery windows specified with much 

ater accuracies, This in turn may substan- 
tially reduce costs. increase data gathering. 
and improve profits for participating busi- 
po» rond may be dispatched more 
efficiently, recurring mis. and 
solutions more. кынкы 

linative uses of GNSS are arising 

almost daily as this technology revolution- 
izes positional data collection. an equip- 
ment has been interfaced 
harvesting equipment. Grain. producion is 
recorded during harvest, so that yield and 
grain quality are mapped every few meters 
in a farm field. This allows the farmer to 
analyze and improve production on a site- 
specific basis. for example. by tailoring fer- 
tilizing applications for each square meter in 
the field. The mix of fertilizers may change 
with position, again controlled by а GNSS 
receiver and software carried aboard a trac- 
tor 
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Optical and Laser Coordinate Surveying 


Historically, coordinate surveys from 
‘optical instruments such as transits, theodo- 
lites, and electronic distance meters were the 
primary means of collecting geographic 
data. While these methods are slowly being 
replaced by satellite-based positioning and 
‘ground-based lasers, they are still quite com- 
mon, and any competent GIS user should be 
familiar with optically based, field surveying 
methods. Spatial data layers are often pro- 
duced directly from field surveys, or from 
field surveys combined with measurements 
‘on aerial photographs. 

Surveying is particularly common for 
highly valued data. Real estate in upscale 
markets may be valued at hundreds to thou- 
sands of dollars per square meter. Zoning. 
‘ordinances often specify the minimum dis- 
tances between improvements and property 
boundaries. These factors justify precise and 
expensive coordinate surveys (Figure 5-31). 


Other commonly surveyed features are. 
power lines. fiber оріс cables, and utilities. 


Plane surveying is horizontal surveying 
based on a planar (fit) surface. The flat sur- 
face assumption provides significant compu- 
tational advantages, because the. 
‘mathematics are substantially less compli- 
cated than those required for geodetic, or 


plane survey is usually defined by a map 
projection, witha known point serving as the 
starting location for the survey. In U.S 
urban areas, these are typically State Plane 
coordinates, or if defined, county coordinate 
systems 

In plane surveying, we typically assume. 
plomb lines are perpendicular to the surface 
at all points in the survey. A plumb bob or 
‘weight is suspended from a string, and is 
assumed to hang in a vertical direction and 
intersect the plane surface at a 90° angle. 


Figure 531: Surveying establishes the coordinates for most propery lines. Field measurements of di 
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y boundary lines. This is 


This isa valid assumption when the errors 
inherent with ignoring the Earth's curvature 
эге small compared to the accuracy require- 
ments ofthe survey, or to the errors inherent 
in the survey measurements themselves. The 
distance error due to assuming a flat rather 
than curved surface over 10 km (6 mi) is 
0.72 cm (0.28 in). Therefore, plane surveys 
are typically restricted to distances under a 
few tens of kilometers. This restriction is 
met in many surveys, and a substantial 
majority of the lines and points surveyed to 
date have been measured using plane sur- 
veying methods. Plane surveying is suffi- 
cient for most subdivisions, public works, 
construction projects, and property surveys. 


Historically, plane surveys have been 
conducted with optical instruments similar 
to those described for geodetic surveys. 
These instruments typically have angle 
gages in the horizontal and vertical planes 
and an optical sight, usually with some 
degree of telescopic magnification. The 
instruments have various names, including, 
in increasing order of sophistication and 
capabilities, level, a transit, a theodolite, 
and a total station (Figure 532). 


Figure £32 A surveying instrament for collect- 
ing coondinate geometry data. 
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Plane surveys typically start at or are 
traceable back to a larger survey network 
through Bench Marks (precisely located 
points described in Chapter 3), or through 
local or project-specific control points estab- 
lished from high-accuracy GNSS positions. 
These marks or control points often serve as 
starting and ending points of a survey, to 
allow accuracy verification. 


Distance and angle measurements are 
the primary field activities in plane survey- 
ing Distances are measured between two 
survey stations, which are points occupied 
‘on the ground. The direction is specified by 
an angle between a standard direction, usu- 
ally north or south, and the direction of the 
surveyed line between the two stations (Fig- 
ше 5-33). The distance is in some standa 
units, for example, standard international 
metes. 


There are two common ways of specify- 
ing angles. The first uses the azimuth. An 
azimuth angle is measured in a clockwise 
direction, typically relative to grid or geo- 
graphic north (Figure 5-13). Azimuths vary 
from 0 to 360 degrees. Note that azimuths 
typically reference grid north although they 
may also be specified relative to magnetic 
north, so care should be taken in clarifying 
which reference is used. 


‘Azimuth Angles 
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,. Angles may also be specified by bear- 
ings, which use a north or south reference 
direction, an angle amount, and an east or 
west angle direction Figure 5-34). The ref- 
erence direction is either north or South, and 
the angle direction is east or west. The angle. 
and direction are specified as "zit", or 
А 


We can convert between azimuth and 
bearing angles. Conversion from bearing to 
azimuth involves noting the reference direc- 
tion (N or S), and adding or subtracting from 
constants, depending on quadrant (Figure $- 
35), We often convert from bearings to azi- 
‘muths when calculating positions from a 
sequential set of distance and angle measure- 
‘ments that form a survey. 

Many surveys are traverses a series of 
‘connected lines that have a marked begin- 
ning and ending point. Traverses typically 
stat at a known control point, or start at a 
point that has been referenced to а known 
‘control point, As described in the preceding. 
sections, the control points are often part ofa 
‘geodetic control network, or pat of à sub- 
network established by а municipal sur- 
veyor. A distance and angle are measured 
from the control point to the first survey sta- 
tion, Coordinate geometry (COGO) may Бе 


Bearing Angles 
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Figure 3 Consenion formulas from boring. 
to azimuth. by quadrant 


used to calculate the station coordinates, 
Subsequent distance and angle measure- 
‘ments may be taken, and in tum used to cal- 
culate the coordinates of subsequent. 
stations, A traverse may be open, with a dif- 
ferent beginning and ending point, or closed, 
with the traverse eventually connecting back 
to the stating location. Most ofthe millions 
of miles of property lines in North America. 
have been established via plane surveys of 
‘open and closed traverses. 


Coordinate geometry consists of a start- 
ing point (a station) and a list of directions 
(bearings) and distances to subsequent sta- 
tions. The COGO defines a connected set of 
points from the starting station to each sub- 
sequent station. A sample COGO descrip- 
tion follows: 

"The starting point is a t-inch iron rod 
natis approximately 102 4 f north and 43.1 
f west of the northeast quarter of the south- 
‘east quarter section of section 16 of Town- 
ship 24 North, Range 16 East, of the 2nd 
Principal Meridian. Starting from the said 
point, hence 1027 fton a bearing north 
723 degrees east, to а 1-inch iron pipe; 
thence 4296 ft on a bearing south, 64.3 


Basic trigonometric functions are used 
to calculate the coordinates for each survey 
station. These stations are located at the ver- 
tices that define lines or areas of interest. In 
the past, these distance and bearing data 
were manually plotted onto paper maps. 
Most survey data are now transferred 
directly to spatial data formats from the sur- 
veying instrument or associated software. 


Field measurements may be directly 
entered and coordinate locations derived in 
the GIS software, or the coordinate calcula- 
tions may be performed inthe surveying 
instrument first. Many curent surveying 
instruments contain an integrated computer. 
and provide for digital dta collection and 
storage. Coordinates may be tagged with 
attribute data in the field, at the measure- 
ment location. These data are then down 
loaded directly from a coordinate measuring. 
device to а computer. Specialized surveying 
programs may be used for error checking 
and other processing. Many of these survey- 
ing packages will then output data in formats 


Coordinate geometry (COGO) 
‘using bearing anges 
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designed for import into various GIS soft- 
ware systems. 

COGO calculations are illustrated on the 
oed Starting from a known 
fo, We measure a distance L 
andan angie, Ve may ten calculate the 
distances in the x and y directions to another. 
set of coordinates, x, and y The coordinates 
‘of and у are obtained by addition of the 
appropriate trigonometric functions. COGO. 
calculations may then be repeated, using the 
эц and уу coordinates as the new starting 
location for calculating the position of the 
next traverse station. 

‘The right side of Figure 5-36 shows a 
sequence of measurements for a traverse. 
Starting at xg, y, the distance A and bearing 
angle, here 45°, are measured to sation xq. 
Ye, The bearing and distance are then mea- 
Süred to the next sation, with coordinates x, 
Yn Distances and angles are measured for all 
subsequent stations. Stating with the known 
Coordinates at ће starting station, xy. y. 
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coordinates for all other stations are calcu- 
lated using COGO formulas. 


Assigning the proper signs tod, and а, 
is important in COGO calculations. An 
incorrect sign for any leg of the traverse will 
propagate through all subsequent coordi- 
nates, causing each to be in error. The proper 
sign is obtained when directions are 
expressed as azimuths and a standard set of 
formulas is used. 

The trigonometric sine and cosine func- 
tions return the proper magnitude and direc- 
tion of d, and d, when applied to azimuth 
angles. [f traverse angles are provided as 
bearings, they ate typically converted to azi- 
‘mut first, using the rules shown in Figure 
5-35, remembering to convert the measured. 
angles to radian units if those are required by 
the spreadsheet or computer language used 
for calculations. Sine and cosine values are 
"hen calculated and multiplied by the tra- 
verse leg distance, resulting in c, and d, val- 
тез of the correct length and direction. 
Examples for all four quadrants are shown in 
Figure $37. 


Ares 


deLsinAz) 
*110-50(349/57 295579) 
22099 

jal-costaz) 
-110-cos(349/57 295579) 
110798 


GNSS is now used for most measure- 
ments farther than a few hundred meters. 
COGO is more commonly applied in GIS 
When collecting data with a GNSS receiver 
in combination with a laser rangefinder. 
Laser rangefinders emit a focused, coherent 
beam of light to calculate distance. The max- 
imum range depends on the size and reflec- 
tive properties of the target, but many 
moderately priced lasers are accurate up to 
several hundred meters. Electronic com- 
passes must be periodically calibrated and 
adjusted for proper magnetic declination, but 
can be quite accurate. Rangefinders often 
also have vertical angle gages, because all 
‘measurements are assumed para lel to the 
datum plane, and non-horizontal measure- 
‘ments must be adjusted. Careful aser range- 
finders measurements can be accurate toa 
centimeter over a 100 m distance, improving 
the efficiency of GNSS data collection. 
Terrestrial, three- dimensional lasers are 
rapidly becoming common, and although 
currently used primarily for structure analy- 
sis, they may be used for GIS data entry. 
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и S Thoee-imensional 


These systems emit narrow, directed laser 
pulses. By carefully measuring the horizon- 
tal and vertical angles relative to the estab- 
lished coordinate system, these laser 
distance measurements can be converted to 
three-dimensional coordinates via coordi- 
nate geometry. GNSS systems typically pro- 
Vide the location of the laser at the time of 
data capture, but additional measurements 
are often necessary to establish an initial or 
reference pointing direction. These data are. 
then used to generate two- or three-dimen- 
sional data layers for spatial data bases. 

Terrestrial three-dimensional laser sys- 
tems collect billions of points, and coll 
tions from multiple locations must Бе 
combined through three dimensional recon- 
structive models to create complete digital 
representations of real-world objects (Figure 
5-38). As software and computer systems 
improve, three-dimensional terrestrial lasers 
will become common. 


scanning Lasers use coordinate to record comprehensive» 
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liver measurements (ei) wupersiposed onthe ду (eourtesy Leta Ооуу меп 


Summary 


GNSS isa satellite-based positioning 
system. It is composed of user, control, and 
satellite segments, and allows precise posi- 
tion location quickly and with high accuracy. 

GNSS is based on range measurements. 
These range measurements are derived from 
measurements of a broadcast signal that may 
be either coded or uncoded. Uncoded, carrier 
phase signals are the basis for the most pre- 
cise position determination, but are of lim- 
ited use for locating features due to 
‘measurement requirements. Code phase 
measurements are primarily used for feature 
collection and entry into GIS. Range mea- 
surements from multiple satellites may be 
combined to estimate position. 


GNSS positional estimates contain error 
due to uncertainties in satellite postion, 
atmospheric and ionospheric interference, 
multipath reflectance, and poor satellite 
geometry These uncertainties vary in time 
and space. 
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There are a number of ways to ensure 
the highest accuracy when collecting GNSS 
data. Perhaps the greatest improvement 
comes from differentially correcting GNSS 
positions. Differential correction is based on 
Simultaneous GNSS measurements at a 
known base location and at unknown field 
locations. Errors are calculated for each 
position fix at the base sation. and sub- 
tracted to the field collections to improve 
accuracy. Accuracy may also be improved 
by collecting with low PDOPs, averaging 
multiple position fixes for each feature, 
avoiding multipath or low horizon signals, 

ind using a GNSS receiver optimized for 
accurate GIS data collection. 

GNSS is most commonly used in GIS to 
digitize features in the field, either for pri- 
‘mary data collection, to update existing data, 
ог for secondary data collection, to support 
corthoimage creation. Terrain, buildings, ог 
"ree canopy commonly obstruct the sky. 
leading to reduced accuracy and efficiencies. 
Modifying PDOP and signal strength thresh- 
‘lds to account for these obstructions may 
increase collection efficiencies, but often at 
the expense of reducing accuracies. Special- 
ized antennas and firmware help, and these 
аге commonly available on GIS-grade 
receivers, but not on commercial receivers. 

GNSS receivers are also used for track- 
ing, navigation, and field surveying. Vehicle. 
tracking applications require GNSS, trans- 
‘mission, and interpretation subsystems, and 
мге becoming widely applied. Animal and 
‘human movements are increasingly being 
tracked via GNSS. 
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‘Study Questions 


51 - Describe the general components of GNSS, including the three common seg- 
‘ments and what they do. 


5.2- What is the basic principle behind GNSS positioning? What is a range measure- 
‘ment, and how does it help you locate yourself? 


5.3. Describe the GNSS signals that are broadcast, and the basic difference between 
carrier and coded signals. 


fA; How macy eins шой you manta cin a бено postion 


5.5 - What are the main sources and relative magnitudes of uncertainty in GNSS 
positioning? 


56 - How accurate is GNSS positioning? Be sure you specify a range. and describe 
under what conditions accuracies are at the high and low end of the range. 


5.7 - What is a dilution of precision (DOP)? How does it affect GNSS position mea- 
‘surements? 


58 - Which of the following figures depicts the lowest and highest PDOPs, assuming 
натта зн P 


2 » ы 
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59 - What is the rank order, highest to lowest, for accuracy of GNSS data given the 
satellite constellations in the following figures, assuming the observer is near the cen- 
ter of the drawn surface? 


S dcs » 
= 
ج ج جج‎ 
+ 
ә = *$ а 5 
or — 


5.10 - Describe the basic principle behind differential positioning. 


5.11 - What are the differences between post-processed and real-time differential 
positioning? 
512 - What is the primary source of error reduced with a dual-frequency GNSS 
receiver? 
5.13 - Place these in order of accuracy, assuming the best equipment and practices 
applied to data collection: 

а) dual frequency, real-time kinematic positioning. 

b) precise point processing. and 

с) post-processed, dual frequency positioning. 
5.14- How is GNSS accuracy affected by the local terrain horizon? How is it affected 
by canopy cover or building obstructions? Why does positional accuracy change as 
these conditions change? 


515 - How are GNSS data accuracy and efficiency (points collected per given time 
interval) related when collecting data in obstructed environments? Why? How is this. 
controlled by field personnel? 
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516- What is WAAS? Is it better or worse than ground-based differential position- 
ing? 


5.17 - Why are distance measurements devices and offsets used when collecting 
GNSS data? 


5.18 - What is COGO? 


5.19 - Complete the table below, calculating missing elements according to the for- 
mulas presented in this chapter. Distances are in meters, the angle Ө in degrees. 


چیک 


EH 


5:21 - Fill in the missing cells, converting points between azimuths and bearings. 


wlr lela |ala ls [> 


Azmuth| 138° зо | 81422” 


Bearing 512% | N33°E | 549°Е | nesw 
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5.22 - Fill in the missing cells, converting points between azimuths and bearings. 


Point | 1 2 3 4 5 6 7 
Azimuth} 278° | 42° 199° 245° |108^1422"| 
Bearing | N82W S7TE муу 


5.23 - Complete the table below for a traverse with the listed distances and 


сенката (я rough sketch may help with the caleulations). 
Note that many spreadsheet, and trigonometric require input in 
radians (approximately 57.2958 degrees = 1 radian). 

Stering point PO. x = 101283, Y = 60964 

Pant 10] Azwar] отока oema x [pev | | v 


pi | sea | wa | esa | 1030 | 101937] 63994 
т | я> | 207 | 2069| -&x [104004 61933 
P3 | 1233 | 305 
pe | 225| ws 
ps | 2739| zoe 
Pe | mao | юг 103290] 60868 


5.24 - Complete the table below for a traverse with the listed distances and 

given as azimuth degrees (drawing a rough sketch may help with the calculations). 
Note that many spreadsheet, calculator, and trigonometric functions require input in 
radians (approximately ST 2958 degrees = 1 radian). 


Starting pont PO. X +1200 Y = 400 


Point 10] Azimuth] Distance] Dena x [Derey] xX | у 
тп | os | ws | tose | -92 | 13000] зов 
pe | me | m | во | ооз 

ps | жг | 208 

pe | 6 | e 

s| [эз 

Pe | юз | гв 11926 | 3992 
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5.25- Complete the table for points 2 through 5, which have the listed GNSS-deter- 
‘mined latitudes, longitudes, and ellipsoidal heights. The NGS data sheets for control 
locations near the measured points аге listed, and you may use these to look up 
appropriate geoidal data. 


L3 1 2 3 1 5 
комны | rosae | моны» | moron | oowoo | moan 
шшш vun. | EEE. | пыт! юш) шы 


Latitude | aririoe | sesrsez| aezens | ате аот | avaro 


Longitude | oiris | exirssz | seat оз | 10929 аз 122750 10 


ERE | гиз | тш | мш | гюз | з» 
Ef Û z392 | чэ 
Осоке 


Height (m) | 23842 


‘5.26 - Complete the table for points 2 through 5, which have the listed GNSS-deter- 
‘mined latitudes, longitudes, and ellipsoidal heights. The NGS data sheets for control 


; A + 5 
rept, | miss en mum 


aso аваг | 3017 2977 | 324 15 | eco aar | arar ssr 


Longitude | 93°00 me | srarsas | naras saz | 109*30 202 | ues 2167 


sth | mer | шз | юз | xo | nen 
сеооа 
cem | ume | use 


Orthomere| 
Height (m) | 28015 
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6 Aerial and Satellite Images 


Introduction 


Aerial and satellite images are common 
data sources for GIS. Images are data 
recorded from a distance, and are often 
referred to as remotely sensed data. Remotely 
sensed data come in many forms; however. 
in the context of GIS we usually use the term 
to describe aerial mages taken from aircraft 
using digital cameras, saellie images 
recorded with satelite scanners, or active. 

stems using laser or radar sensors. Until 
the 1970s, most mapping images were taken 
with film and aerial cameras. Digital aerial 
‘cameras have replaced most film cameras, 
and satellite scanners have found wide use. 
Images are a rich source of spatial informa- 
tion that ave been used in mapping for more 
than seven decades. 

Remotely sensed images are valuable 
sources of spatial data for many reasons, 
including: 
 Broad-area coverage - images capture data 
from large areas ata relatively low cost and 
in a uniform manner (Figure 6-1). It would 
take months to collect enough ground survey 
data to accurately produce a topographic map. 
for 10 km. Images of a region this size may 
be collected in a few minutes and the topo- 
graphic data extracted and interpreted in a 
few weeks. 

Extended spectral range - cameras and scan- 
ners can detect light from wavelengths out- 
Side the range of human eyesight. Some 


cameras and scanners record near- and mid- 
infrared wavelengths, a portion of the light 
spectrum that the human eye cannot sense, 
and useful for distinguishing among features. 
Aerial and satellite scanners sense thermal. 
‘wavelengths. This expanded spectral range 
allows us to detect features or phenomena. 
"hat appear invisible to the human eye. 
‘Geometric accuracy = aerial images are the 
source of many of our most accurate large- 
area maps. Under most conditions, aerial 
images contain geometric distortion due to 
imperfections in the camera, lens, or sensing 
systems. or due to camera tlt or terrain varin- 
tion in the target area. Satellite scanners may 
also contain errors due to the imaging equip- 
‘ment or satellite platform. However, distor- 
tion removal methods are well develo 
cameras and imaging scanners specifically 
built for quantitative mapping are combined 
‘with image correction techniques to yield 
spatially accurate images. 
Permanent record — an image is fixed in 
time. Comparison of conditions over multi- 
ple dates, or determination of conditions ata 
specific date in the past are often quite valu- 
able, and remotely sensed images are often 
the most accurate source of historical infor- 
‘mation. 
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Basic Principles 


‘The most common forms of remote 
sensing are based on reflected electromag- 
netic energy. When energy from the sun 
strikes an object, a portion of the energy is 
reflected. Different materials reflect differ- 
ent amounts of incoming energy, in different 
wavelengths, and this differential reflectance 
gives objects their color. 


Sunlight is the principal energy form 
detected in remote sensing for GIS. Light 
energy is characterized by its wavelength, 
"he distance between peaks in the electro- 
magnetic stream. Each color has a distinc- 
tive wavelength, for example, we perceive 
light with wavelengths between 0.4 and 0.5 
‘micrometers (um) as blue, Light emitted by 
the sun is composed of several different 
wavelengths, and the full range of wave- 
lengths is called the electromagnetic spec- 
‘rum. We graph the Sun's light energy 
(vertical axis) across wavelengths (horizon- 
tal axis) о represent the spectrum (Figure 6- 
2), Energy from the sun increases rapidly to 
а maximum between 0.4 and 0.7 um, and 


energy 


wavelength (pm) о 


drops off at higher wavelengths. Some 
regions of the electromagnetic spectrum are 
named: X-rays have wavelengths of approxi- 
mately 00001 jm, visible light is between 
04 and 0.7 um. and near-infrared light is 
between 0.7 and 1.1 um. 


Ош eyes perceive light in the visible 
portion of the spectrum and are blind to 
shorter and longer wavelengths. We assign 
three base colors: blue, from about 0.4 100.5 
um. green from 0.5 to 0.6 um, and red from 
0.610 0.7 um, Other colors are often 
described as a mixture of these three colors 
at varying levels of brightness. For example, 
‘an equal mixture of blue, green, and red light 
at a high intensity is perceived as "white" 
light. Other colors are produced with other 
mixes; for example, equal parts гей and 
green light are perceived as yellow. The spe- 
cific combination of wavelengths and their 
relative intensities produce all the colors vis- 
ible to the human eye. 


Electromagnetic energy striking an 
object is reflected, absorbed, or transmitted. 


solar radiation transmitted 
ооу the atmosphere 


solar radiation above the atmosphere 


10 


ES Solr dt pn 
an me 


oer city tbe an э! tecti tough de ster 
dared (warp pn owe 


the amosphere This results in 
an 


242 GIS Fundamentals 


Most solid objects absorb or reflect incident 
electromagnetic energy and transmit none. 
Liquid water and atmospheric gasses are the 
‘most common natural materials that transmit 
light energy as well as absorb and reflect it. 


Natural objects appear to be the color 
they most reflect; for example, green leaves. 
absorb more red and blue light and reflect 
‘more green light. We sense these differences 
стозз a range of wavelengths to distinguish. 
among objects. While we see only visible 
welengihs, these differences also extend 
into other parts of the electromagnetic spec- 
trum (Figure 6-3). Individual leaves of many 
plant species appear to be similar shades of 
green: however, sone rele a higher pro- 
ortion of infrared light, and 
[табыш "okt" when viewed 
infrared wavelengths 
Energy transmittance through tbe atmo- 
sphere is most closely tied to the amount of 
Watt vapor in the le, Waler vapor bad 
energy in several portions of the spectrum, 
Sul Ege ase e ae 
in lower transmittance, Carbon dioxide, 
‘other gases, and particulates such as dust 
also contribute to atmospheric absorption. 


reflectance (%) 
Bs 5 8 в 


reducing energy reaching the Earth's surface 


Моя aerial images and satellites use 
passive remote sensing, in that they use 
‘energy generated by the sun and reflected off 
ofthe target objects. The images from these 
passive systems may be affected by atmo- 
spheric conditions in multiple ways. Figure 
6-4 illustrates the many paths by which 
energy reaches a remote sensing device. 
Only one of the energy paths is useful, that 
of surface reflection from ground features. 
The other paths provide lile oc no informa- 
tion about the target objects. Most passive 
systems are not useful during cloudy ог 
extremely hazy periods because nearly all 
the energy is scattered or absorbed. There 
are some passive systems useful at night, 
eg, emission microwave sensing, but most 
passive systems rely on the sun's energy, so 
are useless at night. 

atv tens generale an energy signal 
and detect the energy returned. Dil 
inthe quanti and direction of the retuned 
energy are used to identify features and their 
properties. Radar (radio detection and rang- 


wavelength (um) 


reflectance curves for some common substances. The 
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tem. Radar focuses a beam of energy 
through an antenna, and then records the 
reflected energy. These signals are swept 
across the landscape and the returns assem- 
bled to produce a radar image. Radar images 
are usually monochromatic (in shades of 
ray) These images may be collected day or 
night, and most radar systems penetrate 
clouds because water vapor does not absorbs 
the relatively long radar wavelengths. 

LIDAR systems (or lidar, light detection. 
‘and ranging) are also common. They use. 
lasers, typically in the visible or near infra 
тей portion of the spectum, to measure a dis- 
tance from the scanner to а target, and then 
calculate heights. Heights measured across 
an area may be combined to create а height 
surface and DEM. LiDAR is described in 
detail in а later section. 

Images we view either display a single 
range of wavelengths and appear in gray 


composite color images, with specific wave- 
length regions set to color blue, green, and 
red outputs (Figure 6-5). Color images are 
most common, as we usually collect м least 
three spectral bands, often four, and with 
some scanners up to hundreds of bands. 
Each band records the amount of energy in a 
specific set of wavelength, e.g. а near-infra- 
red band may record the energy returned 
between 0.9 and 1.0 um. 

Each individual band may be thought of 
averaging the energy returned over a 
range of wavelengths. Bands may vary in 
limits, e.g., one camera may record a green 
band that senses energy from 0.610 0.68 um, 
‘while another camera may sense energy over. 
the range from 0.62 to 0.67 pum, Data for a 
band may be conceived as stored ina grid, 
representing the reflectance for each cell or 
pixel. over the area imaged. 
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We must assign each recorded band toa 
color on our displayed image. Images, both 
‘on а monitor or a hardcopy. are created by 
mixing primary colors, typically blue-green- 
red. We assign one layer recorded for one 
‘band to the green color, and variation in 
intensity sensed in the band translates into 
the range of dark to light greens for this 
layer, We do the same for red and blue lay- 
ers, assigning an input band to each. We then 
combine the bands, that is, the colors for 
‘each corresponding pixel, to create a com- 
posite, multi-band image. 

True color images are most common, 
and for these we assign the red-sensed band 
to the red color on ouput, the green-sensed 
band to the green output color, and the blue- 
sensed band to the blue output color. This 


Multiband a 
image 
data 


provides an image similar to the colors. 
observed by the human eye. 

We may also assign different band com- 
binations or different orders. Many aerial or 
satellite scanners sample in four or more 
bands, including an infrared band in addition 
to the three visible bands. The Sentinel satel- 
lite scanner, described later in this chapter, 
collects thirteen bands. We must choose 
which of these bands will be assigned 
respectively to the blue, green, and red dis- 
play colors. Different band selections usu- 
ally result in different displayed images, 
‘ih abnormal colors for features, eg green 


Vegetation may appear purple or red, 
depending on the band combinations used. 
We typically choose the bands that best help 
us identify important features. 
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Aerial Images 


Images taken from airborne cameras are 
а primary source of geographic data. Aerial 
photography quickly followed the invention. 
of portable cameras in the mid-19th century. 
and became a practical reality with the 
development of dependable airplanes in the 
early 20th century (Figure 6-6). Photogram- 
merry. the science of measuring geometry 
and mapping from images, was well devel- 
oped by the early 19305, and there have been 
continuous refinements since. Aerial images 
underpin most large-area maps and surveys 
in most countries. Digital mapping cameras. 
became common in the 21st century largely 
supplanting aerial cameras, and are often 
carried aboard aircraft optimized for aerial 
surveys (Figure 6-7). Aerial images are rou- 
tinely used in urban planning and manage- 
ment, construction, engineering. agriculture, 
forestry, wildlife management, and other 
mapping applications. 


Although there are hundreds of applica- 
tions for aerial images, most in support of 
GIS may be placed into three main catego- 
ries. First, aerial images are often used as a 
basis for mapping. to identify and outline. 
objects. Measurements on images offer a 
rapid and accurate way to obtain geographic 
coordinates, particularly when image mea- 


igure 4 Aetia ey began sory fer 


Figure 67. Aerial photographs are ofen taken 
бош specialized aircraft, ch a slow alude. 
Biplane or from helicopter or higher Dying 
leper aera (comtesy Seibud Lid). 


surements are combined with ground sur- 
veys. In a second major application, image 
interpretation may be used to categorize or 
assign attributes to surface features. Images 
эге often used for landcover and infrastruc- 
ture mapping. and to assess the extent of fire, 
flood, or other damage. Finally, images are 
‘often used as a backdrop for maps of other 
features, as when photographs are used as a 
background layer on web maps or for soil 
Survey maps produced by the U.S. National 
Resource Conservation Service. 


Aerial Mapping Systems 


There are aerial camera systems and 
platforms specifically designed for mapping. 
The camera and components are built to 
minimize geometric distortion and maximize 
image quality. Mapping cameras have fea- 
tures to reduce image Ыш due to aircraft 
motion, enhancing image quality. They 
‘maintain or record orientation angles, so dis- 
tortions can be removed. These camera sys- 
tems are precisely made, sophisticated, 
highly specialized, and expensive, and most 
often used when large-area, high-resolution, 
accurate images are required. 

Modem aerial cameras are typically 
mounted inside ап aircraft, pointing through 
an underside bay (Figure 6-8). The camera 
mount and aircraft control systems are 
designed to maintain the camera optical axis 
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a5 near vertical as possible. Aircraft naviga- 
tion and control systems are specialized to 
support aerial photography, with precise 
positioning and flight control. 


Aerial cameras specialized for spatial 
data collection are large, expensive, and 
complex, but in principle they are similar to 
simple cameras. A simple camera consists of 
a lens anda body (Figure 6-9). The lens is 
typically made of several individual glass 
elements, with a. or other mecha- 
nism to control the amount of light reaching 
the sensing media, the digital sensor (or in 
times past, film). These sensors have a char- 
acteristic dimension, sd, and for digital sen- 
sors, а pixel size, that when combined with 
the flying height (Ы), and focal length (п), 
determine the ground resolution and imaged 
area, An exposure control, such аз a shutter, 
‘within the lens, sets the length of time the 
sensing element is exposed to light. Cameras 
also have an optical axis, defined by the sen- 
sor orientation. The optical axis is the central 


direction of the incoming image, and it is 
precisely oriented to intersect ће sensor ina 
perpendicular direction. Digital sensors are 
Connected to electronic storage so that suc- 
cessive images may be saved, Images are. 
recorded at a surface called the camera's 
focal plane, ideally perpendicular to the 
‘optical axis. The time, altitude, and other 
conditions or information regarding the pho- 
‘tographs or mapping project may be 
recorded by the camera, often as an elec- 
tronic header on digital image files. 

Base scale and extent are important 
attributes of remotely sensed data. Image 
scale, as in map scale, is defined as the rela- 
tive distance on the image to the correspond- 
ing distance on the ground. For example, 1 
inch on a 1:15,$40-scale photograph corre- 
sponds to 15,840 inches on the Earth's sur- 
face. As shown in Figure 6-9, image scale 
will be пн the ratio of focal length to fly- 
ing height, 


Image extent is the area covered by an 
image, and depends on the physical size of 
the sensing area or element (sd in Figure 6- 
Э), the camera focal length (N), and the fly- 
ing height (H), according to: 


gd-sd* Hh (62 


‘The extent depends on the physical size 
of the recording media, sd (eg. 5x $ cm 
digital sensor), and the lens system and fly- 
ing height. For example, a 5 cm sensing ele- 
ment with а 4 cm focal length lens flown at 
3,000 m height (about 10,000 f results inan 
extent of approximately 3.75 by 375 ka 
(5.1 mi) on the surface of the Earth. 

Image resolution is another important 
concept. The resolution isthe smallest object 
that can reliably be detected on the image. 
Resolution in digital cameras is often set by 
the pixel size, the size of individual sensing 
elements in the sensing апау For 
Sx 5 cm array with 7,000 cells in each direc- 


(ed, sensor, 
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tion will have a сей size of 0,05/7,000, 
which is 7.1 x 10% m, or 7.13 um. 


The realized or ground resolution on 
aerial images may be approximately calcu- 
lated from equation (6.1), substituting сей 
dimension for sensor dimension, sd. In our 
‘example, ifthe camera has а 10 cm (0.1 m) 
focal length, and is flown at 3,000 m, the 
‘ground resolution is: 


021 m=71%10-6* 3000/01 (62) 


Resolution also depends on the contrast 
‘of objects, and depends on sensor response. 
Resolution may be tested via photographs of 
alternating patterns of black and white lines, 
‘At some threshold of line width, the differ- 
‘ence between black and white lines cannot 
be distinguished. giving the effective resolu- 
tion. 


Digital Aerial Cameras 


Digital aerial cameras are the most com- 
mon systems used for aerial mapping. They 
typically consist of an electronic housing 
that sits atop a lens assembly (Figure 6-10). 
The lens focuses light onto charge-coupled 
devices (CCDs) or similar electronic sensing 
elements. The CCD contains linear ог rect- 
angular arrays of pixels, or picture elements, 
that respond to light. 

The sensing element is composed of lay- 
ers of semiconducting material with appro- 
priate reflective and absorptive coatings, 
insulators, and conducting electrodes (Fig- 
‘ure 6-11), Incoming radiation passes through 
the coatings and into the semiconductors, 
dislodging electrons and creating a voltage 
or current. Response may be calibrated and 
‘converted to measures of light intensity. 
Response varies across wavelength, but can 
be tuned to wavelength regions by manipu- 
lating semiconductor composition. Since the 
pixels are in an array, the array then defines 
an image. 
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Digital cameras sometimes use a multi- 
lens cluster rather than a single lens, or they 
may split the beam of incoming light via a 


1: 
typically configured to be sensitive to only a 
narrow band of light, multiple CCDs may be. 
used, each with a dedicated lens and a spe- 
cific waveband. Multiple CCDs typically 
allow more light for each pixel and wave- 
band, but this increases the complexity of 
the camera system. If a mult-lens system is 
used, the individual bands from the multiple 
lenses and CCDs must be carefully coregs- 
ered. ог aligned, to form a complete mulli 
band image. 

jl cameras most commonly collect 
images in the blue (0.4-0.5 um), green (0.5- 
06 pm), ог тей (0.6-0.7 um) portion of the 
electromagnetic spectrum. This provides an 
image approximately equal to what the 
human eye perceives, Systems often also 
record near-infrared reflectance (0.7-1.1 
jum). particularly for vegetation mapping, 
The camera may also have a set of filters that 
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may be placed in front of the lens, for exam- 
ple, for protection or to reduce haze. 

Digital cameras typically have а com- 
puter control system, used to specify the 
location, timing, and exposure; record GPS 
and aircraft altitude and orientation informa- 
tion: provide data transfer and storage; and 
allow the operator to monitor progress and 
image quality during data collection. 

Digital cameras may have several fea- 
tures to improve data quality. For example, 
digital cameras may employ electronic 
image motion compensation. combining 
information collected across several rows of 
CCD pixels. This may lead о sharper 
images while reducing the likelihood of 
camera malfunction due to fewer moving 
parts. In addition, digital data may be 
recorded in long, continuous strips, easing 
the production of image mosaics. 

‘While nearly all present and бише 
aerial images will be collected with digital 
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cameras, there isa vast archive of past aerial 
images collected on film. Large-format 
films, 240 mm (9 in) on a side, were most 
‘common for mapping. Most film-based pho- 
Tographs from mapping cameras have been 
‘converted to digital forms. Their character- 
stics and use are well described in the litera- 
ture and are not described here. 


Lens and Camera Distortion 


‘The camera and lens in any system may 
bea significant source of geometric error in 
aerial images. The perfect lens-camera- 
detector system would exactly project the 
viewing geometry of the target onto the. 
image recording surface, either film or CCD. 
The relative locations of features on the 
image in a perfect camera system would be 
exactly the same asthe relative locations on. 
any viewing plane that is in front of the lens, 
Real camera systems are not perfect and may 
distort е image. For example, the light 
from a point may be bent slightly when trav- 
eling through the lens, or optical axis/focal 
plane mis-alignments may skew the image, 
both causing image distortion. 


The mapping camera systems typically 
carried on large aircraft are engineered to 
minimize systematic errors. Lenses are. 
designed and precisely manufactured so that 
image distortion is minimized. Camera parts 
ace optimized to ensure а faithful rendition 
‘of image geometry. Sensors and mounts are 
‘painstakingly aligned to reduce perspective 
‘angle distortion. This optimization leads to 
‘extremely high geometric fidelity in the. 
‘cameralens system, Thus, camera and lens 
distortions in mapping cameras are typically 
much smaller ап other errors, for example, 
pointing or terrain errors, discussed in the 
next section, or digitizing errors when con- 
venting the image data to forms useful ina 
Gis. 
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Radial lens displacement is a common 
fonn of distortion in less expensive cameras 
"hat should be corrected in mapping projects. 
All manufactured lenses contain imperfec- 
tions in the lens shape. These cause radial 
distortions inward or outward, or sometimes 
both, from the true image location (Figure 6- 
13). Radial lens displacement is typically 
quite small in the large, expensive, special 
ized mapping cameras flown on large air- 
craft, A radial displacement curve or other 
equations can be used to correct camera and 
lens errors and yield the highest mapping 
accuracies. 


Camera-caused geometric errors are 
often quite large when using a non-mapping 
‘camera, common with drone-based imaging. 
Radial distortion may be large when com- 
pared to mapping cameras. Non-mapping 
Cameras may be appropriate when geometric 
accuracy requirements are lower and when 
proper methods are applied to calibrate and 
Correct the largest geometric errors. These 
methods usually include bench or field scan- 
ning of image calibration panels (Figure 6- 
14) and appropriate correction models. Soft- 
ware may be applied to estimate error pat- 
tems, calculate camera/lens-specific 
correction equations, and apply these to 
reduce distortion. While not as accurate as 
mapping cameras, corrected images from. 
‘non-mapping cameras may meet project 


accuracy thresholds, If these correction 
‘methods are not applied. large geometric 
плиз are likely. The following sections 
describe remaining geometric error causes 
and removal, assuming cameralens distor- 
tions have been addressed. 


Aerial Image Spatial Accuracy 

Aerial images эге a rich source of spatial 
information, but most aerial images contain 
some geometric distortions (Figure 6-15 
through Figure 6-18), even if taken with per- 
fect camera systems. These distortions must 
be corrected before the images are used in 
mapping. 

We most often conceive of our spatial 


data layers as orthographic. with all objects 


projected onto a common, two-dimensional 
plane (Figure 6-15, right). Objects above ог 
below the plane are vertically projected 
down or up onto tbe horizontal plane. Thus, 
the top and bottom of a building should be 
“flattened downward” onto the same foot- 
print in the datum plane. In our ideal data 
set, the tops of all buildings would be visi- 
ble. but none of the sides. 

Unfortunately, most aerial ог satellite 
images provide a non-orthographic perspec- 
tive view (Figure 6-15, lef), Perspective 
views give a geometrically distorted image 
of the Earth's surface. Distortion affects the 
relative positions of objects, and uncorrected 
data derived from serial images may not 
directly overlay data in an accurate 
orthographic map.The amount of distortion 
їп aerial images may be reduced by selecting 
the appropriate camera, lens, flying height, 
and type of aircraft. Distortion may also be 
controlled by collecting images under proper 
‘weather conditions during periods of low 
wind and by employing skilled pilots and 
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‘operators. However, some aspects of the dis- 
tortion may not be controlled, and no camera 
system is perfect, so there is some geometric 
distortion in every uncorrected aerial image. 
The real question becomes, "is the distort 
and geometric error below acceptable limits, 
given the intended use of the spatial data?” 
This question is not unique to aerial images; 
it applies equally well to satellite images, 
spatial data derived from GNSS and wadi- 
tional ground surveys, or any other data. 
‘The largest distortion in aerial images 
taken with mapping cameras comes from 
two sources: terrain variation and camera 
tilt, particularly when using an aerial map- 
ping camera or when properly calibrating 
‘and correcting non-mapping cameras, e.g. 
with drone/UAV images. Atmospheric bend- 
ing is relatively minor under most conditions 
when collecting aerial images, but may still 
be unacceptable, particularly when the high- 
est-quality data are required. Established 
methods should be used to reduce the typi- 
‘ally dominant tik and terrain errors, and for. 
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‘non-mapping cameras, the previously 
described lens and camera distortion. 


Terrain and Tilt Distortion in 
Aerial Images 


Ternin variation, defined as differences 
in elevation within the image area. is often 
"he largest source of geometric distortion in 
aerial images. Terrain variation causes relief 
displacement, defined as the horizontal 
shifts in object positions on images due to 
differences in elevation. 

Figure 6-16 illustrates the basic princi- 
ples of relief displacement. The figure shows. 
the image geometry over an area with sub- 
stantial differences in terrain, The reference. 
surface (datum plane) inthis example is cho- 
sento beat the elevation of the nadir point 
directly below the camera, N on the ground. 
and imaged atn on the photograph. The 
‘camera station P is the location of the cam- 
era at the time of the photograph. We are 
assuming a vertical photograph. meaning the 
optical axis of the lens points vertically 
‘below the camera and intersects the refer- 


ence datum surface at a right angle at the 
nadir location. 


The locations for points A and B are 
shown on the ground surface. The corre- 
sponding locations for these points occur at 
А and В on the reference datum surface, 
These locations are projected onto the ira 
ing sensor or film, as they would appear in a 
photograph taken over this varied terrain. In 
ıa real camera, the sensor is behind the lens; 
however, itis easier to visualize the dis- 
placement by showing the sensor in front of 
the lens, and the geometry is the same. Note 
that the points о and b are displaced from 
their reference surface locations, o and b. 
The point o is displaced radially outward rel- 
ative toc, because the elevation at A is 
higher than the reference surface. The dis- 
placement of із inward relative tob, 
because В is lower than the reference datum, 


Note that any points that have elevations 
exactly equal to the elevation of the refer- 
ence datum will not be displaced, because 
the reference and ground surfaces coincide 
at those points. 


Figure 6-16 illustrates the following key 
characteristics of terrain distortion in vertical 
aerial images: 

Terrain distortions are radial - higher eleva- 
tions are displaced outward, and lower ele- 
vations displaced inward relative to tbe 
center point. 

Relief distortions affect angles and distances 
оп an image — relief distortion changes the 
distances between points, and will change 
most angles. Straight lines on the ground 
‘will not appear to be straight on the image, 
and areas will expand or shrink. 

Scale is not constant on uncorrected aerial 
images = scale changes across the photo- 
graph and depends on the magnitude of the 
relief displacement. We may describe an 
average scale fora vertical aerial photograph 
over varied terrain, but the wue scale 
between any two points will often differ. 

A vertical aerial image taken over varied 
‘terrain is not orthographic = we cannot 
expect geographic data from terain-dis- 
топе images to match orthographic data in 
а GIS. If the distortions are stall relative to 
digitizing error or other sources of geometric 
error, then data may appear to match data 
from orthographic sources. If the relief dis- 


Vertical 


Side 
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placement is large, it will add significant 
eros. 

Camera tilt may be another large source 
of positional error in aerial images, Camera 
tilt, in which the optical axis points at a non- 
Vertical angle, results in complex perspec- 
tive convergence in aerial images (Figure 6- 
17). Objects farther away appear to be closer 
together than equivalently spaced objects 
that are nearer the observer (Figure 6- 

Tilt distortion is zero in vertical photo- 
graphs, and increases as tilt increases. 


Tilted 


Resulting 


distortion 
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Contracts for aerial mapping missions 
typically specify tilt angles of less than 3 
degrees from vertical. Perspective distortion 
caused by tilt is somewhat difficult to 
remove, and removal tends to reduce resolu- 
tion near the edges of the image. Therefore, 
efforts are made to minimize tilt distortion 
by maintaining a vertical optical axis when 
images are collected. Camera mounting sys- 
ems are devised so the optical axis of the 
lens points directly below, and pilots attempt 
to keep the aircraft on a smooth and level 
flight path as much as possible. Planes have 
stabilizing mechanisms, and cameras may be. 
‘equipped with compensating mechanics to 
‘maintain an шийгей axis. Despite these pre- 
cautions, tilt happens, due to flights during. 
windy conditions. pilot or instrument error. 
or system design. 


Figure 6-19: tange tt 
СГ and бе Z axis (юре Ө. 


Tilt is often characterized by three 
angles of rotation, omega (о), phi (0), and 
kappa (x) (Figure 6-19), about the X, Y, and 
Z axes that define three-dimensional space. 
Rotation about the Z axis does not result in 
tik distortion, because it is perpendicular. 
with the surface. Ife and ¢ are zero, then 
there is no tilt distortion. However, tilt is 
almost always present, even in small values, 
зо all three rotation angles are required to 
describe and correct it. 

Tilt and terrain distortion may occur 
together on aerial images taken over varied 
terrain, The overall level of distortion 
depends on the amount of tilt and the varia- 
tion in terrain, and also on the photographic 
scale. Figure 6-20 illustrates the changes in 
total distortion with changes in tilt, terrain, 
and image scale. This figure shows the error 
that would be expected in data digitized 
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from vertical aerial images when only apply- 
ing an affine transformation (see Chapter 4). 
а simple procedure used to register data lay- 
ers. These error plots mimic digitizing from 
partially corrected aerial images, common in. 
less-sophisticated processing packages. or 
when GNSS data are sub-optimal. Note first 
that there is zero error across all scales when. 
the ground is flat and there is no tilt (bottom 
line, top panel in Figure 6-20). With tilt, 
errors increase as image scale decreases, as 
you move from left to right in both panels. 
Error also increases as tilt or terrain increase. 

Geometric errors can be quite large, 
even for vertical images over moderate ter- 
rain (Figure 6-20, lower panel). These 
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graphs clearly show that tilt alone introduces 
large errors, and terrain further degrades 
geometric accuracy in aerial images to the 
point where they are inappropriate sources 
for GIS data. Some form of correction, usu- 
ally based on stereo coverage and projection 
geometry, is required. 


Stereo Photographic Coverage 
As noted above, relief displacement in 
vertical aerial images adds a radial displace- 
ment that depends on terrain heights, The 
larger the terrain differences, the larger the 
relief displacement. If two overlapping pho- 
tographs are taken, called a stereopair, these 


Flat Terrain 


Varied Terrain 


Proto scale (10008) 
6.20: Terrain ad нї effects ов mean exer when from uncorrected aerial 
Figure’ positional 


images 
[5:3 


rent hen i and eram crease. and a photo 


decreases (rom Bolstad. 


256 GIS Fundamentals 


photographs may be used together to deter- 
‘mine the relative elevation differences and 
remove distortion. 

А stereomodel is a three-dimensional 
perception of terrain or other objects that we. 
see when viewing a stereopair: As each eye 
looks at a different, adjacent photograph. 
from the overlapping stereopair. we observe 
a set of relative spatial shifts in objects, and 
‘our brain may convert these to a perception 
of depth. 

Stereomodels are visible in stereopairs 
due to parallax, a shift in relief displacement 
due to a shift in observer location. Figure 6- 
21 illustrates parallax. The block (closer to 
"he viewing locations) appears to shift less 
"han the sphere when the viewing location is 
changed from the left to the right side of the 
‘objects, The displacement of any given point 
is different on the left vs. the right views 
because the relative viewing geometry is dif- 


Ground view - left 


ferent. Points are shifted by different 
‘amounts, and the magnitude of the shift 
depends on the distance from the observer. 
(or camera) to the objects. This shift in posi- 
tion with a shift in viewing location Б by 
definition the parallax, and is the basis of 
depth perception. 

Many mapping projects collect stereo 
photographic coverage. in which sequential 
photographs in а flight line overlap, called 
‘endlap. and adjacent flightlines overlap, 
called sidelap (Figure 6-22). Stereo photo- 
graphs typically have near 65% endlap from 
‘one image to the next in а line, and 25% 
sidelap for adjacent lines. Some digital cam- 
eras collect data in continuous strips and so 
only collect sidelap. Drone-based collections. 
often have nearly 100% endlap or sidelap, 
depending on tbe project and method for 
data collection, 
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Geometric Correction of Aerial 
Images 


Due to the geometric distortions 
described earlier, uncorrected aerial images 
should be corrected before digitizing. Points, 
lines, and area boundaries may not occur in. 
their correct locations, so length and area 
‘measurements may be incorrect. These dis- 
tortions are a complex mix of terrain and tilt 
effects, and camera and lens distortions, and 
will change the locations, angles, and shapes 
of features in the image and any derived 
data. Spatial data derived from uncorrected 
or partially corrected photographs will not 
opty ign with ores in ober layers. 
Iver may fall on the wrong side of a road 
teach tum bs Komeda be 
Given all the positive characteristics of 
aerial images, how do we best use this rich 
source of information? Fortunately, photo- 
‘grammetry provides the tools needed to 
remove geometric distortions from photo- 
graphs. These corrections depend on two 


primary sets of measurements. First, the 
location of each image's perspective center 
‘camera orientation must be known. These 
are the effective location of the camera focal 
point and pointing direction at ће time of. 
imaging. It can be determined from precise 
GNSS, or deduced from ground measure- 
‘ments, ог a combination of both. Second, 
some direct or indirect measurement of ter- 
rain heights must be collected. These heights 
may be collected at a few points and ste- 
теорайз used to estimate all other heights, ог 
they may be determined from another 
source, for example, a previous survey, 
radar, or LIDAR systems described later in 
this chapter. Armed with perspective center. 
and height measurements, we may correct 
ош aerial images. 

Geometric correction of aerial images 
involves calculating the distortion at each 
point. and shifting the image location to the 
Correct orthographic position. Consider the 
tower in Figure 6-23. The bottom of the 
Tower at B is imaged on the photograph at 


6.23: ва, 
SCC aad о-п-Стеме! 


may be ص‎ baned on prone nete: Sida tis 
Sigal tes ic poppies pom We aly 
ing height, H, and can measure d and p on the photograph. d 


point b, and the top of the tower at point A is 
imaged on the photograph at point o. Point A 
will occur on top of point В on an 
orthographic map, that is, point В won't be 
visible. If we consider the flat plane atthe 
base of the tower as the datum, we can use 
simple geometry to calculate the displace- 
ment from o to b on the image. We'll call 
this displacement d, and go through an 
explanation of the geometry used to caku- 
late the displacement 


Observe two similar triangles in Figure 
6-23, one defined by the points S-N-C, and 
one defined by the points o-n-C. These trian- 
gles are similar because the angles are equal, 
that is, the interior angle at n and N are both 
90°, the triangles share the angle at C, and 
the interior angle at S equals the interior 
angle ato. C is Ше focal center of the camera 
lens, and may be considered the location. 
through which ай light passes. The film in a 
camera is placed behind the focal center: 
however, as in previous figures, the film is 
shown here in front of the focal center for 
clarity. Note that the following ratios hold 
for the similar triangles: 


D/P мн (62) 


and also 
dip «D/P (64) 


(65 


66) 


where: 
d ^ displacement distance 
p= distance from the nadir point, n, on 


the vertical photo to the maged 
point a 
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H= flying height 

^ = height of the imaged point 

We usually know the lying height, and 
‘can measure the distance p. If we can get h, 
the height of the imaged point above the 
datum, then we can calculate the displace- 
ment. We might climb or survey the tower to 
measure its height, ^ and then calculate the 
photo displacement by equation (6.6). Relief 
displacement for any elevated location may 
be calculated provided we know the height. 
Heights have long been calculated by mea- 
surements from stereopairs, but are increas- 
ingly measured using LIDAR, described 
later in this chapter. These heights and equa- 
tions are used to adjust the positional distor- 
tion due о elevation, “moving” imaged 
points to an orthographic positon. 

Figure 6-24 illustrates the distortion in 
an image of a straight pipeline right-of-way, 
bent on the image by differences in height 
from valleys to rigdetops (lef). Knowledge 
‘of image geometry allows us to correct the 
distortion (Figure 6-24 right). 

Equation (6.6) applies to vertical aerial 
images. When photographs ae tilted, the 
distortion geometry is much more compli- 
‘cated, as are the equations used to calculate 
tilt and elevation displacement. Equations 
may be derived that describe the three- 
dimensional transfer from the terrain surface 
to the two-dimensional film plane. These 
equations and the methods for appying them 
are part of the science of photogrametry. 

Typically. measurements of image xand 
yare initially specified relative to some arbi- 


trary, image-specific coordinate system. 
These measurements are obtained from one 


‘or many images. Ground X, Y, and Z coordi- 
nates come from precise global positioning 
system surveys, and these points are identi- 
fied on each image, and the arbitrary, image- 
specific xand y coordinates extracted. 

А set of equations is writen that relates 
image x and y coordinates to ground X, Y, 
and 2 coordinates. The set of equations is 
solved, and the displacement calculated for 
‘each point on each image. There is a sepa- 
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rate set of equations for each image, relating 
the displacement to locations on the image. 
‘The displacement may then be removed and 
эп orthographic image or map produced. 
Although some (hopefully small) distortion 
remains, distances, angles, and areas are 
‘more accurately represented on the output 
image. These cut images, also known s 
orthophorographs or digital 

images. have the positive attributes of photo- 
graphs, with their rich detail and timely cov- 
erage, and some of the positive attributes of 
cartometric maps, such as uniform scale and 
true geometry. 

Multiple images or image strips may be 
analyzed, corrected, and stitched together 
into a single mosaic. This process of devel- 
‘oping photomodels of multiple images all at 
‘once utilizes interrelated sets of equations to 
find a globally optimum set of corrections 
across all images. 


‘Small Unmanned Aerial Vehi- 
cles: Drones 


Small, unmanned planes and helicopters 
have been introduced over the past decade 
(Figure 6-25). Variously called unmanned 
aerial vehicles (UAVS), remotely piloted 
vehicles (АРУЗ), or simply drones, they may 
substantially reduce the cost and increase the 
exibility of data collection. Data from 
drones often require increased processing 
times and exhibit more variable accuracy, 
given the small footprint and greater diffi- 
culty maintaining a level orientation in these 
small aircraft. They most often carry small 
digital cameras, although drones may also 
сапу other sensor systems. Image acquisi- 
tions are often at low flying heights and 
extremely high resolutions (inches to 10's of 
centimeters), and small areas. 


‘One primary advantage of drones is low 
cost and ease of deployment. Many drone 
systems for professional-quality GIS data 
collection currently соя less than $20,000, 
and some below $5,000, including all sub- 
systems for converting raw images into 
orthographic, georeferenced images. Drones 
may be carried to а site and launched, often 
by hand for the smallest units, using prepro- 
grammed flight lines to collec images along 
‘specified path (Figure 6-26). This allows 
data collection at the time and place of inter- 
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est, provided weather conditions allow. 
Smaller data collection windows are needed, 
increasing the likelihood of successful data 
capture. 

High data resolutions may be another 
advantage of UAVs. Because they may be 
flown at low altitudes, pixel sizes of a few 
‘centimeters or less are possible. Individual 
bridge beams, rooftop fans, or paths may be 
resolved. leading to more detailed data, with 
very high point densities (Figure 6-27) 

Smaller UAVs are limited in their data 
collection rates, and likely will not be suit- 
able for areas much larger than a few to tens 
of square kilometers, They fly at relatively 
low speeds, and typically carry small camera 
systems with commensurately small image 
footprints. UAVs for GIS data collection 
range in size from less than a meter wing- 
span through several meters, and there is a 
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trade-off, with increasing costs associated 
‘with increasing system throughput. Larger 
UAVs may collect data at rates approaching 
current manned aerial systems, but in doing 
о lose many of the cost, flexibility, and res- 
olution advantages. 


One primary disadvantage of drones is 
that spatial data from small UAVs are often 
of poor quality. Care should be taken in 
choosing the proper system and evaluating. 
realized spatial accuracies. Don't confuse 
accuracies with resolution. Cameras often 
have large spherical lens distortion. Small 
UAVs often use less accurate GNSS and 
have larger positioning errors, and use inap- 
propriate corrections in software. 

These limitations may be addressed by 
choosing specialized drones and software 
‘optimized for spatial data collection (Figure 
6-28), For example, image correlation and 
3D reconstruction algorithms may be robust 
in finding correct image orientations, and 
advising the analysts when they are unable 
to reach acceptable accuracies. Lens or other 
system distortions may be removed through 
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precise calibration using standardized test 
patterns. These measures are not available or 
applied for many UAV systems or softwares 
‘marketed as spatial data collection tools. tis 
up to the data consumer to verify the pro- 
claimed accuracy of any system. 


Photo Interpretation 


Aerial images are used primarily to 
identify the position and properties of inter 
esting features. Once acquired, we must 
apply photo (ot image) interpretation to con- 
vert images into information. in the form or 
vector or raster feature layers. Photo inter- 
pretation is a well-developed discipline, with 
many specialized techniques. We will pro- 
vide a very brief description of the process. 
More complete descriptions are found in 
several of the sources listed at the end of this 
chapter. 


Interpreters use the size, shape, color, 
brightness, texture, and relative and absolute 
location of features to interpret images (Fig- 
29), typically digitizing directly on а 
sereen displayed image, as described in 
Chapter 4. Differences in these diagnostic 
characteristics allow the interpreter to distin- 
quish among features. In the figure, the poly- 
gon near the center of ће image labeled 
Po-C, a pasture, is noticeably smoother than 
the polygons surrounding it: the polygon 
above it labeled As-¥1 shows a finer-grained 
texture reflecting smaller tree crowns than 
the polygon labeled NH-M above it and to 
the eft Different vegetation types may show 
distinc color or texture variations, road 
types may be distinguished by width or the 
occurrence of a median strip, and building 
types may be defined by size or shape. 

‘The proper use of all the diagnostic 
characteristics requires that the photo inter- 
preter develop some familiarity with the fea- 
tures of interest. For example, it is difficult 
to distinguish the differences between many 
crop types until the interpreter has spent time 
in the field, photos in hand, comparing what 
appears on the photographs with what is 
found on the ground. This ground reference 
data is invaluable in developing the local 
knowledge required for accurate image 
interpretation. When possible, ground visits 
should take place contemporaneously with 
the photographs. 


Photo interpretation requires we estab- 
lish a target set of categories for interpreted 
features. If we are mapping roads, we must 
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decide what classes to use; for example, all 
roads will be categorized into one of these 
classes: unpaved, paved single lane, paved 
‘undivided multi lane, and paved divided 
multi lane. These categories must be inclu- 
sive, so that in our photos there must be no 
roads that are multi lane and unpaved. If 
there are roads that do not fit in our defined 
classes, we must fit them into an existing 
‘category, or we must create а category for 
them. 


Photo interpretation also requires we 
establish а minimam mapping unit. or MMU. 
A minimum mapping unit defines the lower 
limit on what we consider significant, and 
usually defines the area, length, and/or width 
‘of the smallest important feature. The arrow 
in the lower right comer of Figure 6-29 
points to a forest opening smaller than our 
minimum mapping unit for this example 
map. We may not be interested in open 
patches smaller than 0.5 ha, or road seg- 
ments shorter than 50 m long. Although they 
may be visible on the image. features 
smaller than the minimum mapping unit are 
not delineated nor transferred into the digital 
dota layer. 


Images may need enhancement to 
improve feature identification. Common 
adjustments include band selection, and 
modification of display brightness, contrast, 
or image histograms, Bands typically keep a 
‘one-to-one correspondence with true color 
images, matching each output color to the 
respective input color. Band selection i 
needed when more than three bands are col- 
lected, most commonly three visible plus 
near- or mid-infrared bands. The analyst 
‘must choose which bands to display and in 
‘which output color e.g., green, red, and 
infrared image layers to the blue, green, and 
тей colors of the output display, to yield a 
typical “false color” image with enhanced 
vegetation discrimination. Some imaging 
systems collect additional mid-infrared 
bands, and different band combinations have 
proven best for specific features, e.g., а 
green, mid-infrared, near infrared combina- 
tion for specialized cameras and targets 
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Most GIS software allow you to change 
the base color assigned to each image band, 
and manipulate the image histograms to 
enhance image display (Figure 6-30). We 
may optimize the various image histogram 
thresholds to best reveal the features we 
wish to identify. Analysts typically adjust at 
least the upper and lower thresholds for each. 
band so that the sensed energy spans the 
entire range of brightnesses that a computer. 
monitor can display. Remember from Figure 
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6-3 that some features reflect a few percent 
of the incoming radiation while other fea- 
tures reflect more the 70% in those same 
bands. Differences may be obscured if dis- 
play thresholds are not adjusted, reducing. 
the information that can be extracted from 
images. Detailed descriptions of more 
sophisticated image enhancements can be 
found in most introductory remote sensing 
textbooks. 
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Satellite Images 


‘The previous sections described the 
basic principles of remote sensing and the 
specifics of aerial image collection and cor- 
‘ection. In many respects, satelite images 
are similar to aerial images when used in 
GIS. We aim to collect information on the 
location and characteristics of features. 
However, there аге important differences 
between photographic and satellite-based 
systems used for image collection, and these 
differences affect the characteristics and 
hence uses of satellite images. 

Satellite systems have several advan- 
tages relative to aerial imaging systems. Sat- 
elites offer a very high perspective, which 
significantly reduces terain-caused distor- 
tion. Equation (6,6) shows the terrain dis- 
placement (d) on an image is inversely 
related to the flying height (H). Satellites 
have large values for H, often 600 km (370 
mi) or more above the Earth's surface, so 
relief displacements are correspondingly 
mall Because satellites are lying above the. 
atmosphere. their pointing direction is very 
precise, and so tilt errors may be controlled. 

There are additional trade-offs in satel- 
lite vs, serial platforms. Satellite images typ- 
ically cover larger areas, so if the area of 
interest is large, costs per unit area may be 
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small; but conversely, satellite costs for 
small areas with satellites may be quite high. 
Satellite images may require specialized 
image processing software. Acquisition of 
aerial images may be more flexible because. 
a pilot can fly ora drone sent up on short 
notice. Many aerial images have better effec- 
tive resolution than satellite images. Many 
of these disadvantages of using satellite 
images diminish as more, higher-resolution, 
pointable scanners are placed in orbit, 


Basic Principles of Satellite 
Image Scanners 


Scanners operate by pointing the detec- 
{ors at the area fo be imaged. Fach detector 
has an instantaneous field of view, or IFOV, 
that corresponds to the size of the area 
viewed by each detector (Figure 6-31). 
Although the IFOV may not be square and a 
raster cell typically is, this IFOV may be 
thought of as approximately equal to the ras- 
ter cell size for the acquired image, 

The scanner builds a two-dimensional 
image of the surface by pointing a detector 
ог detectors at each cell and recording the 
reflected energy. Data are typically collected 
in the across-track direction, perpendicular 


to the flight path of the satellite, and in the 
along-track direction, parallel to the direc- 
tion of travel (Figure 6-31). Several scanner 
designs achieve this across- and along-track 
scanning. Some older designs use a spor 
detector and a system of mirrors and lenses 
to sweep the spot across track The forward 
motion of the satellite positions the scanner 
for the next swath in the along-track direc- 
tion, Other designs have a linear array of 
detectors =a line of detectors in the across- 
track direction (Figure 6-32). The across- 
track line is sampled at once, and the for- 
ward motion of the satellite positions the 
array for the next line in the along-track 
direction. Finally, a two-dimensional array 
may be used. consisting of a rectangular 
апау of detectors. Reflectance is collected in 
а patch in both the across-track and the 
along-trck directions. 

A remote sensing satellite also contains 
a number of other subsystems, integrated 
into a single platform to support image data 
collection (Figure 6-33). A power supply is 
required, typically consisting of solar panels 
and batteries. Precise altitude and orbital 
control are needed, so satellites сапу navi- 
gation and positioning subsystems. Sensors 
evaluate satellite position and pointing direc- 
tion, and thrusters and other control compo- 
nents orient the satellite. There isa data 
storage subsystem, and a communications 
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Figure 6.38: The Landsat $ satellite 
ica tegration ad testing (courtesy NASA) 


subsystem for transmitting data back to 
Earth and for receiving control and other 
information. All of these activities are coor- 
dinated by an onboard computing system. 

Several remote sensing satelite systems 
have been built, and data have been avail- 
able for land surface applications since the 
acy 19708 The detail. equecy, and qual- 
ity of satellite images have been improving 
steadily, and there are several satellite 
remote sensing systems currently in opera- 
tion, 

Because most satellites are in near-polar 
orbits, images overlap most near the poles. 
‘Adjacent images typically overlap a small 
amount near the equator. The inclined orbits 
are often sun synchronous, meaning the sat- 
ellite passes overhead at approximately the 
same local time. 
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High-Resolution Satellite Sys- 
tems 


There is a large and growing number of 
high-resolution satellite systems, here rather 
arbitrarily defined as those with a resolution 
finer than 4 m. This is the resolution long 
available on the largestscale aerial photo- 
‘graphs, and used for fine-scale mapping of 
detailed features such as sidewalks, houses, 
roads, individual tees, and small-area land- 
scape change. Commercial systems provid- 
ing 30 cm resolution are in operation (Figure 
6-34), with higher-resolution systems in the 
offing. This detail blurs the distinction 
‘between satellite and photo-based images. 


Images from high-resolution satelite 
systems may provide a suitable source for 
spatial data in а number of settings. They are. 
typically required by cities and businesses 
for fine-scale asset management, for exam- 
ple, in urban tre inventories, construction 
‘monitoring, or storm damage assessment. 
Nearly all the systems have pointable optics 
‘or satellite orientation control, resulting in 


Figure 6-34: A 0.3 m zesouion 
detal sealable from the 


short revisit times, on the order of one to a 
few days, 

Spectral range, price, availability, reli- 
ability. flexibility, and ease of use may 
become more important factors in selecting 
between aerial images and high-resolution 
satellite images, Satellite data are attractive 
‘when collecting data for larger areas. ог 
where it is unwise or unsafe to operate air- 
чаб, or because data for large areas may be 
‘geometrically corrected for less cost and 
ime. Aerial images may be preferred when 
resolutions of a few to fens of centimeters 
are needed, or for smaller areas, under nar- 
Tower acquisition windows, or with instru- 
‘ment clusters not possible from space. Aerial 
images will not be completely replaced by 
satellites, but they may well be pushed 
towards the finest resolutions and county- 
sized or smaller collections. 

As of early 2022, there are several oper- 
ational satellite systems capable of global 
image acquisition at 1 m resolution or better, 
with some as low as 15 cm. These satellites 
and related systems are predominantly com- 


mercial enterprises, funded and operated by 
businesses. We will describe examples, but 
given the number of current systems, and 
тие of development and launch of new sys- 
tems, the description is incomplete. In addi- 
tion, there are several recently 
decommissioned high-resolution systems for 
Which archive data are available and still 
useful, e.g.. the Ikonos satellite that operated 
from 1999 through early 2015 (Figure 6-35), 
and the Quickbird system, operational from 
2001 through early 2015. 

The WorldView Legion series are an 
example of the current generation of high- 
resolution satellites. There is a planned con- 
stellation of six satellites. with а 30 cm рап. 
chromatic and 1.16 meter multispectral 
modes. Multispectral bands cover the visible 
and infrared portions of the spectrum. With 
pointable optics, the system will provide up 
to 15 images of any one location each day. It 
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has a higher radiometric depth and spatial 
accuracy and more frequent revisit schedule 
than the preceding WorldView-3 and World- 
View satellites, which carry a similar set 
of panchromatic and visible infrared bands. 
Off-nadir resolution is poorer than 30 cm, 
but satellites ш the Worldview series can 
image the entire globe at better than 50 cm 
resolution daily Nadir images are collected 
at approximately 100 a.m. local time, à 
Common characteristic of these polar orbit- 
ing. sun-synchronous systems 

Another set of high-resolution images 
come from the Satellite Pour l'Observatin 
dela Terre (SPOT), versions SPOTS and 
SPOT:-7. These are an evolution of a set of 
mid-resolution satellites, 5РОТ-1 through 
“5, described in the next section. The high- 
resolution satellites carry a 1.5 m panchro- 
matic and 6 m resolution multispectral scan- 
ner, the latter with four bands spanning the 


Figure 635: A 0.5 m resolution finion image of Venice. Italy from the Tkono-2 satelite. Note the Grand 
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visible through near-infrared spectrum (Fig- 
ше 6-36). SPOT has a 60 km swath width at 
madir. Note that this larger swath width pro- 
vides 15 to 40 times the area coverage of the 
highest-resolution satellites, and illustrates a 
more general trade-off between satellite 
image resolution and the area covered by 
each image. The set of SPOT satellites has a 
daily revisit capability, completely covering 
the Earth's landmasses every two months. 


The Dove satellite cluster by Planet Inc. 
carries this notion ofa constellation of smal, 
inexpensive, high-resolution satellites far- 
ther, with a fleet of hundreds of satellites, 
approximately the size of a common mail- 
box, inexpensively deployed in clusters. 
First satellites were launched in 2013, with a 
group deployment of 28 satellites in 2014, 
and the full constellation during 2015. There 
‘were over 200 operational Dove satellites in 
mid 2021. 


Images have 3 m resolution at nadir 
(Figure 6-37), although may reach 5 m in 
some configurations. The constellation pro- 
vides daily or higher revisit times, with 

igher return frequency based on orbital pat- 
tems, location on Earth, and satellite tasking. 
Images are stitched together for complete 
global coverage. updating a global mosaic 
on a daily basis. 


Satellogic follows a similar model to 
Planet, with a constellation of 90 satellites 
planned. Their current NuSat series provide. 
both -meter resolution, visible/near infrared 
images and 30 meter hyperspectral data, 
with up to 600 spectral bands across the 0.4 
1003 -meter range. At this writing there 
are 21 satellites, but when the constellation 
is complete there will be an opportunity for 
multiple images per day for any location on 
the Earth. 


Figure 6.36: A SPOTS image of Bora-Bora, demomtrating the high resolution over a relatively large 
ren stable Бош the item (courtesy SPOT lage Cor. 
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Figure 637: Cameron Ре fein enl Colorado. oped t 3 кина revolution Korn Deve 


net constellation Hundred of мее allow sage repeat tes of bou or lee, slowing 
аке а fire monitoring i near real ne (courtesy Planet lnc 


There are a number of other high-resolu- 
tion satellite imaging systems, some with a 
local focus, e.g. the KOMPSAT-2 satellite, 
designed to collect data primarily over east- 
em Asia, and the Cartosat- satellite, provid- 
ing 0.9 m resolution panchromatic data, 
primarily focused on south Asia. Other high- 
resolution satellite constellations include 
Skysat, DMC3, and Gaofen. 

From the above, we see that there is 
quite a range and large number of high-reso- 
lution satellite systems. The newer systems 
provide pancehomatic and multipsectral data 
at frequencies of a few hours to few days, 
oftenat sub-meter accuracies. These systems 
will provide data needed for large-scale, 
large area spatial analysis, such as urban. 
infrastructure assessment or planning, crop 
monitoring. transportation monitoring and 
planning. flooding and other disaster assess- 
ment and response — in short, for activities 
requiring high spatial and up to daily tempo- 
ral resolutions, over up to mid-sized regions, 
eg. cities to county areas. 


Mid-Resolution Satellite Systems. 


There are several mid-resolution satel- 
lite systems, here defined as those providing 
images with resolutions from $ m to less 
than 100 m. These are most often used for 
medium- to broad-area analyses, for exam- 
ple, landcover mapping at state or province, 
regional, or national extents, or large-area 
disaster management. Individual image col- 
lections are generally several tens to hun- 
dreds of kilometers ona side, and revisit 
times from a few days to a few weeks. 


SPOT 


‘SPOT is one of the longest running. 
uninterrupted mid- to high-resolution satel- 
lite imaging systems. The French Govern- 
ment led the development of SPOT, with 
SPOT-1 launched in early 1986. There were 
four additional SPOT mid-resolution satel- 
lites, labeled two through five, all since 
decommissioned. The two operating high/ 
mid-resolution upgrades, SPOT-6 and -7, 
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were described in the previous section 
because they offer high resolution images as 
well as mid-resolution. The mid-resolution. 
sensor provided visible through mid-infrared 
bands at 10 to 20 m. SPOT data are rou- 
tinely used in a number of resource manage- 
ment, urban planning, and other 
applications. 

There is a large archive of early SPOT 
images, useful for time series analysis and 
change detection, but the latest SPOT satel- 
lites are nearing their designed lifespan, and 
at this time there is no plan for a replace- 
ment 


Landsat 


The Landsat-9 satellite is the latest in 
the longest running series of mid-resolution. 
imaging satellites. Landsat-9 collects a 15 m 
resolution panchromatic band, 8 multispec- 
tral bands at 30 m in the visible, nearinfra- 
red, and mid-infrared portions of the 
spectrum, and two bands in the thermal 


infrared range with а 100 ш resolution. The 
system has a 185 km swath width at nadir 
and a repeat interval of 16 days. When com- 
bined with the similar Landsat-8 mission, 
there will be an 8 day retur interval. 


Landsat-9 uses an instrument called the 
Operational Land Imager-2 (011-2) to col- 
lect non-thermal bands. The specific bands 
used were selected to be compatible with 
previous Landsat missions, and to improve 
cloud detection and aerosol/atmospheric 
haze analysis. The OLI-2 also increases the 
bit depth, or data width, from 12 to 14 bits 
over Landsat8, giving a broader and more 
sensitive response, and clearer, more 
detailed images. 

Because Landsat was the first Earth- 
observing satellite system and it has oper- 
ated nearly continuously since 1972, there is 
эп image repository spanning five decades. 
The majority ofthese images (Figure 6-38) 
are available free of charge to anyone with 
an internet connection, allowing long-term 


Figure 6-38: An example of Landis image. showing the Missisippi River Deka, Mid-esolution sat- 
Ке teeta чай cnr deeem outa NASA 


‘monitoring and analysis. Landsat-9 images 
are processed and added to this archive, typi 
cally within a few days of collection. resul- 
ing in an inexpensive source of broad-scale 
images. This long time series is particularly 
appropriate for change analysis, provided 
the differences between legacy and new data 
resolutions and formats are addressed. 

Landsat satellites have carried a range of 
sensors, starting with a Muhi- Spectral Scan- 
ner at 80 m resolution (MSS), а Thematic 
Mapper (TM) or Enhanced Thematic Map- 
per (ETM+), with 30 m resolutions, and now 
the OLE2 with an up o 15 m resolution. 
Band wavelengths have varied, but generally 
included visible and infrared portions of the 
spectrum. The satellites have had a 16 to 18 
day return interval 

Landsat is used in many projects world- 
‘wide because of the breadth of radiometric 
bands, the large scan area for individual 
images, the long data record, and no-cost 
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data. Landsat is the basis of many statewide 
and national land cover mapping projects, 
and it has been used to assess forest health, 
urban growth, and water quality in lake and 
coastal areas. Landsat is particularly appro- 
priate for change detection, and much work. 
has established methods for radiometric cor- 
rection through time and across sensors, so 
that the time series of images may be used to 
шар urban growth, Vegetation change, and 
trajectories in water quality. 


Sentinel 


The Sentinel system. launched by the 
European Space Agency (ESA), is com- 
prised of six missions, including atmo- 
spheric, oceanic, and land resources (Figure 
6-39). Each mission consists of two satel- 
lites. typically in offset orbits to provide 
maximum coverage and frequent repeat 
observations. 
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Sentinel-2 frequently contributes to GIS, 
ази provides land surface measurements 
including landcover classification, vegeta- 
tion type, structure, and health, and snow 
cover and hydrology. images in 12 spectral 
bands, from below blue wavelengths 
through shortwave infrared. Cell resolution 
varies by band from 10 to 60 meters, with a 
full complement of visible and near-infrared 
bands provided at 10 m resolution. Images 
are ina 290 km (180 mi) cross width, and 
variable length strips. Data are recorded in 
12 bits, providing 16 times the radiometric 
resolution of 8 bit data. The two satellites 
together provide a five-day renum frequency 
and global coverage, high among mid-reso- 
lution satellites, Data are free after registra- 
tion with an ESA distribution portal. These 
are among the most widely useful mid-reso- 
lution data available at this writing 


Coarse-Resolution, Global Satel- 
lite Systems 


Coarse-resolution sensors are here 
defined as those with pixel dimensions of 
250 meters or larger. These are used for large 
national to global analysis, where smaller 
pixel dimensions result in unwieldy data vol- 
umes, and the things of interest span conti- 
rental to global extents. There are currently 
several past, currently available, and soon to 
be launched coarse-resolution sensors; we 
will describe two representative systems. 
The Visible Infrared Imaging Radiome- 
ter Suite, oc VIIRS, an instrument created o 
collect data for weather, ocean. and land sur- 
face analysis (Figure 6-40) It collects 9 visi- 
ble near-infrared bands plus a day/night 
‘band, $ mid-infrared bands, and 4 long-wave 
infrared bands. VIRS collects data at both 
375 and 750 m resolutions, and 3,040 km 
wide swath width, providing global cover- 
эре on a daily basis. VIIRS is a substantial 
improvement over previous coarse-resolu- 


tion satellites in several ways. Bands are tai- 
ored to provide enhanced information in 
specifie windows, with improved vegetation. 
ocean color and productivity. land and ocean 
temperature, and cloud. fire. smoke, and sea 
ice detection. VIIRS data are freely available 
via US. NOAA portals. 


MODIS is a NASA research system 
nearing the end ofits life that collects data at 
a range of resolutions and wavebands, from 
visible through thermal infrared bands. It is 
described here because it was the first 
global, coarse-resolution system designed 
for both land and water imaging. and 
because there is an a extensive archive past 
of images. Resolutions vary from 250 m to 1 
kam, with a repeat frequency of every one to 
two days for the entire Earth's surface at 1 
km resolution. Thirty-six bands are collected 
When operated in the 1 km mode, ranging 
from 0.4 um to 14.4 ym. Only two bands 
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have the 250 m resolution, one each in the 
red and infrared portions of the light spec- 
trum. These data were somewhat unique 
when introduced in that the resolution was 
finer than the 1 km resolution of previous 
daily coverage satellite data. but substan- 
tially coarser than Landsat, SPOT, and mod- 
erate-resolution satellites (Figure 6-41) 


Other Systems 


A number of other systems provide data 
for GIS. perhaps chief among them various 
rdar-based satellite systems, Radar wave- 
lengths are much longer than optical remote 
sensing systems, from approximately one to 
tens of centimeters, and may be used day ог 
night, through most weather conditions. 
Radar images are panchromatic, because 
they provide information on the strength of 
the reflected energy at опе wavelength, 


Figure 641: A MODIS 250 m resolution image of northern ly and Switzerland. The snow-covered 


Alps cross through the ceste ofthis image, ooh ofthe Po River valley ш ly Sal cu 
sre visible, aes turbidity ш he Mediterranean Sea and variation in land cover (courtesy 


диод, 
NASA) 
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Radar systems have been successfully used 
for topographic mapping and some land- 
cover mapping. particularly when large dif- 
ferences in surface texture occur, such аз 
between water and land, or forest and 
recently clearcut areas. Operational systems. 
include the ERS-1, operated by tbe Euro- 
pean Space Agency; the JERS-1, by tbe 
National Space Development Agency of 
Japan; and the Radarsat system. developed 
and managed by the Canadian Space 
Agency 

There are several other airborne and sat- 
elite remote sens ing systems that are opera- 
tional or under development, with the paper 
by Ustin and Middleton listed at the end of 
this chapter providing a useful summary for 
environmental applications, and similar ref- 
‘erences found in a web search by application. 
area. Although some systems are quite spe- 


Nine 


cialized, each is a potentially useful source 
of data for GIS. Some may introduce 
entirely new technologies, while others 
replace or provide incremental upgrades to 
existing systems, Given the rapid pace of 
development, one is well served by keeping 
abreast of space imaging news. 


Satellite data often requires specialized 
processing before use in a GIS, but there are 
‘web-archives designed to ease access. Data 
are provided that are corrected for amo- 
spheric or sensor effects, to remove distor- 
tion, and converted to standard map 
projections. Google Earth Engine is one. 
such archive, with a comprehensive stack of 
satellite and derived data available through 
several standard programming interfaces. 


A radar image of the Teide volcano ов Tenerife Inland, Spain. Radar images ae wef for 
and {ес ear, and may be collected though clouds aad м ap (courtesy 


Satellite Images in GIS 

Satellite images have two primary uses 
in GIS. First, satellite images are often used 
to create or update landcover data layers. 
Satellite images are appropriate 
for landcover classification by virtue of their 
uniform data collection over large areas. 
Landcover classes often correspond to spe- 
cific combinations of spectral reflectance 
values, For example. forests often exhibit a 
distinct spectral signature that distinguishes 
then rom other landcover classes (Figure 6- 
a) 

Satellite image classification involves. 
identifying the reflectance pattems associ- 
ated with each landcover class and then. 
applying this to classify all areas 
TEE. 


‘Multiband image data 


band 2 
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been developed to facilitate landcover map- 
ping using satellite data, as well as tech- 
niques for testing the classification accuracy 
ofthese landcover data. Regional and state- 
‘wide classifications are commonly per- 
formed, and these data are key inputs in a 
mmber of resource planning and manage- 
ment analyses using GIS. 

Satelite images are also used to detect 
‘and monitor change. The extent and inten- 
sity of disasters such as flooding, fires, or 
hurricane damage may be determined using 
satellite images. Urbanization, forest cutting, 
agricultural change, or other changes in land 
‘use or condition have ай been successfully 
‘monitored and analyzed based on satellite 
ata. Change detection often involves the 
‘combination of new images with previous 
landcover, infrastructure, or other informa- 
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tion in spatial analyses to determine the 
‘extent of damage, to direct appropriate. 
responses, and for long-range planning. 

‘There are many examples of landcover 
data created across large areas, from states 
through continents. The Multi-Resolution 
Land Characteristics (MRLC) project aims 
to map landcover for the United States on 
approximately decadal frequencies. and the. 
Corine project aims to map all of Europe at 
that or greater frequency (Figure 6-44), 


Aerial or Satellite Images: Which 
to Use? 


The value of satellite and aerial images 
for GIS should be clear. Several sources аге 
often available fora given study area. An 
obvious question is "Which should I use?” A 
number of factors drive this choice. 


First the ima ge dta should provide the 
necessary spatial resolution. The resolving 


power of a system is generally defined by 
the smallest high-contrast object that can be 
detected. and is often approximately the 
pixel size. Current high-resolution satelite 
systems have effective spatial resolutions of 
30cm to several meters (foot to tens of feet). 
Images from digital mapping cameras, when 
taken at typical scales and with commonly 
used aerial scanners on planes, resolve 
objects in the 1S to 100 cm range (six inches 
to three feet) UAV-based imaging systems 
are often deployed to produce resolutions in 
the one to 10 centimeter range. Although the 
‘gaps are blurring, this high to lower resolu- 
tion ladder still affects choice. 

Second, the size of the analysis area 
should be considered. Aerial images are. 
often less expensive for small areas. Aerial 
images are often available from government 
sources at low cost. Plane-based aerial. 
images often cover from tens to hundreds of 
Square kilometers, with low cost per square. 
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kilometer. As the size of the study area 
increases, the costs of using plane or UAV- 
acquired images may increase. Mult-image 
mosaics are often needed, raising costs, until 
at some point costs often surpass satellite 
images for the same area. 

Third, satellite scanners may provide a 
broader spectral range and narrower bands 
relative to aerial images. As noted earlier, 
satellite scanners may detect well beyond the 
visible and near-infrared spectrums that are 
more common in aerial scanners. If import- 
ant features are best detected using these 
portions of the spectrum, then satellite data 
are preferred. Broad-spectrum scanners are 
available for aerial and UAV systems, but 
these are rarer and tend to lose the cost 
advantage when compared to satelite sys- 
tems, There are often tradeoffs between the 
sizeof the area to be imaged. resolution, and 


spectral bands used when selecting a system. 


Finally, accuracy must be considered. 
‘Accuracy generally can't be much finer than 
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the pixel resolution — you can't measure 
what you can't see. However, accuracies are 
often much poorer than image pixel sizes, 
and itis sometimes a grave mistake to 
‘assume image resolution and accuracies are 
‘equal. This is particularly true for UAV sys- 
tems, where the GNSS systems used in 
‘drone positioning and for ground coordinate 
reference points may not be to as high a stan- 
dard as with plane-based systems. Precise. 
‘ortho-correction requires several advanced 
Techniques, including terrain and tilt 
removal, ens distortion removal, and ortho- 
registration. Professional-grade UAV sys- 
tems are usually for high accura- 
дез. Less expensive, semi-professional or 
hobbyist UAV systems are common, and 
typically not designed for accuracy. While 
available at low cost, they often provide dis- 
torted data. A system should be chosen 
which provides both the resolution and accu- 
racy required. 
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Airborne LiDAR 


A number of laser-based. light detection 
and ranging systems (LIDAR) are becoming 
‘common. Lasers are pointed at the Earth's 
surface from an aerial or satellite platform, 
pulses of laser light emitted, and the 
reflected energy is recorded (Figure 6-45). 
Like radar, laser systems are active because. 
they provide the energy that is sensed. 
Unlike radar, lasers have limited ability to 
penetrate clouds, smoke, or haze. 

LIDAR systems have been used primar- 
ily to gather data about topography, vegeta- 
tion, and water quality. Laser pulses reflect 
back from the canopy and the ground, and 
the strength and timing of the rerum is used 
o estimate ground height, canopy height, 
and other canopy characteristics (Figure 6- 
45). LIDAR signals over water also typically 
result in multiple returns, including water 
surface height and various depths, so lasers 
may be used to measure water clarity and 
nearshore water depth. 
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‘Commercial LIDAR mapping systems 
are relatively new and have been used pri- 
marily for collecting surface data from air- 
craft and satellites. As noted earlier. three- 
dimensional LIDAR surveying from tripods 
or ground vehicles is growing, but we won't 
expand on them here. 

Aerial LIDAR collection systems typi- 
cally consist of a downward pointing 
LiDAR. a precision GNSS to record the 
plane's position to a very high accuracy, and 
эп orientation sensing system to measure the. 
angle of the LIDAR pulse relative to the ver- 
tical direction. LIDAR energy pulses are 
directed downward. Some energy from each 
pulse is reflected from vegetation, buildings, 
or other features above the ground. but under. 
‘most conditions, many signals reach the 
ground and retum to the airborne laser plat- 
form. The time interval between laser pulse 
emission and the ground retum may be used 
то calculate aircraft height above the terrain. 
Flying height is known from the GNSS and 
the terrain elevation calculated for each 
pulse. Pulses may be sent several thousand 
Times a second, scanning back and forth 
across the landscape, so а trace of ground 
heights may be measured from every few 
centimeters toa few meters along the ground 
(Figure 6-46). 

Discrete-return LIDAR is most common, 
wherein the system records specific values. 
for each laser pulse downward. Typically, 
the first retum from a pulse, last return, and 
perhaps one to several intermediate retums 
are recorded. Woveform LIDAR collects a 
Continuous record of the pulse retums, the 
‘waveform trace shown in Figure 6-45. 

Discrete-retum LiDAR systems produce 
point clouds (Figure 6-47), consisting of X, 
Y. and Z coordinates, and the intensity, scan 
angle, retum order, and other information. 
‘Modem laser systems often produce densi- 
ties of several to tens of points per square 
meter of ground area, and these point clouds 
must be processed to remove errors, identify 
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Figure 646: LIDAR sampling poner. exch dot reprevenas a LADAR enum. Sean lines overlap, renti in 
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ground points, and assign points to feature. 
types such as buildings or vegetation. Soft- 
ware for primary processing has been devel- 
oped by most vendors so that files are 
delivered withthe coordinate and height data 
assigned to the highest practical accuracy, 
and points classified with a standard number 
code that indicates the type of feature “hit 
bby." or associated with each laser retum. 
These standard codes identify ground 
(value = 2), buildings (value = 6), or water 
(value = 9). Several characteristics are used 
to classify points by feature type, including 
retra strength, point order (first, last, or 
intermediate), local slope or texture, and the 
location and strength of adjacent returns. 
There area growing number of state- 
Wide LIDAR projects, often driven by flood- 


plain mapping or for improved topographic 
measurements. Ground resolutions of 5 cm 
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(2 in) or better are currently possible when 
LiDAR is combined with precise GNSS and 
aircraft orientation measurements. These 
projects report the "average" point density, 
but LIDAR returns are typically collected in 
swaths across the landscape, with individual 
scan lines discemible when viewed at large 
scales, Projects are planned and flown such 
‘that an appropriate amount of overlap exists 
between adjacent scans and adjacent flight 
paths, both to avoid gaps in coverage and 
areas with an unacceptably low sampling 
density 

Processing extracts the most relevant 
retu for the desired product, for example, 
the maximum first retum in a given square 
area may be extracted and assigned to a ras- 
ter cell when calculating tree height, or a 
mean or minimum value when extracting 
ground heights: Different processing of the 
LIDAR point cloud will result in dil 
extracted valves. 

Horizontal and vertical errors less than а 
few centimeters are attainable, allowing the 
use of airborne lasers to measure building 
height (Figure 6-48). floodplain location and 
extent, and slope and derived terrain charac- 
teristics, at much higher density and accu- 


тасу, over large areas, than previously 
possible. 

LIDAR data have also been widely used 
to estimate vegetation characteristics, 
including tee height, forest density, forest 
‘wood amounts, understory density, growth 
rates, and forest type (Figure 6-49), A large 
‘number of points reach the ground in all but 
the densest forests, and the ground vs. 
locally highest canopy returns usually give 
an estimate of tre height that is as accurate 
as traditional manual measurements. The 
proportion of LIDAR returns is strongly 
related to canopy density, and to tree and 
forest wood mass. Crown shape can be 
determined from dense LiDAR data, which 
n tum helps separate forest types. LIDAR 
also can provide data on understory plant 
density, even in full-canopied forests, 
because at high sampling densities LIDAR 
has proven useful at passing through small 
canopy gaps. 

There is often a choice between LIDAR- 
based and image-based elevation measure- 
ments, particularly when the area to be sur- 
veyed is relatively small, on the order of tens 
to hundreds of hectares (or acres), Small-for- 
mat cameras, flown on drones and properly 


Figure 648: An example of LIDAR dat an diction of building heights. This image shows lower Maubat- 
{aly New York in late 2001 (courtesy NASA) 
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corrected and processed. may provide eleva- 
tion data with a few centimeter resolution 
and accuracies, which are also possible with 
LiDAR. LiDAR scanners are more expen- 
sive and heavier than small camera systems, 
and if there is no need for sub-canopy infor- 
mation. then photo-based collection systems 
may provide the needed elevation data at 
lower costs, As the area sampled increases, 
small camera drone systems rapidly become 
unfavorable due to large photo and data pro- 
cessing volumes, and short drone fight 
times between the need to replace power 
supplies, resulting in long times for larger. 
area coverage. 

LiDAR data are often delivered in a 
standard LAS format, maintained by the 
American Society of Photogrammetry and 
Remote Sensing (ASPRS). The standard 
defines the file structure, content, storage 
order, naming, codes, and all other informa- 
tion so that any user may be able to access, 
process, and distribute LiDAR data ina stan- 
dard way. The standard has evolved through 
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trees 


various versions, up to 14 when this book 
edition was written. The convention defines 
the standard LIDAR exchange file with a аз 
file extension, for example, mylidarlas. 
Also note that there are competing, non- 
standard, compressed formats defined by 


Rapidlasso has specified а different com- 
pressed format, with the аг extension, used 
by the USGS National Map. Formats should 
at a minimum be openly defined, with all 
users having access to the file and storage 


specifications, and the ability to write inde- 
pendent code to read and write the files. 


Image Sources. 


National state, provincial or local gov- 
ernments are common sources of aerial 
images. These photographs are often pro- 
‘vided ata reduced cost. For example, the 
National Agriculture Imagery Program 
(NAIP) provides coverage of much of the 
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lower 48 United States on an annual basis. 
Images are usually collected in true color, 
but color infrared images may also be 
acquired, typically at a resolution of 1 meter 
or benter. Photographs are usually collected 
during mid-growing season. The NAIP pro- 
gram is coordinated through the USDA 
Farm Services Administration, and so the 

images are sometimes referred to as FSA or 
FSA-NAIP photographs. Online and hard- 
copy indexes are available to aid in identify- 
ing appropriate image mosaics. 

Aerial images may also be purchased 
from other government agencies or fom pri- 
vate organizations. The U.S. Geological Sur- 
vey (USGS) and U.S. Forest Service (USFS) 
routinely take aerial images for specialized 
purposes. The USFS uses aerial images to 
map forest type and condition, and often 
requires images ata higher spatial resolution 
and different time of year than those pro- 

ided by NAIP. The USGS uses aerial 
images in the development of digital ortho- 
photographs and maps. These organizations 
also excellent sources of historical aerial 
images. Many government agencies contrib- 
ме to a national archive of aerial images, 
‘some of which may be accessed via the 
interet. 


Summary 


Aerial and satellite images are valuable 
sources of spatial data. Photos and images 
provide large-area coverage, geometric 
accuracy, and a permanent record of spatial 
and attribute data, and techniques have been 
‘well developed for their use as a data source. 

Remote sensing is based on differences 
among features in the amount of reflected 
electromagnetic energy. Chemical or elec- 
tronic sensors record the amount of energy 
reflected from objects. Reflectance differ- 
ences are the basis for images, which may in 
tum be interpreted to provide information on 
the type and location of important features. 

Aerial images are a primary source of 
coordinate and attribute data. Camera-based 
mapping systems are well developed, and 
are the basis for most large-scale topo- 


graphic maps currently in use. Camera tilt 
and terrain variation may cause large errors 
оп aerial images; however, methods have 
been developed for the removal of these. 
errors. Terrain-caused image displacement is 
the basis for stereo photographic determina- 
tion of elevations. 

Satellite images are available from a 
range of sources and fora number of specific 
purposes. Landsat, the first land remote 
sensing system, has been in operation for 
nearly 30 years. and has demonstrated the. 
мійу of satellite images. SPOT, AVHRR, 
Ikonos, and other satellite systems have been. 
developed that provide a range of spatial, 
spectral, and temporal resolutions, 

Aerial and satellite images often must be 
interpreted to provide useful spatial informa- 
tion. Aerial images are typically interpreted 
‘manually. An analyst identifies features. 
based on their shape, size, texture, location, 
color, and brightness, and draws boundaries. 
or locations, either on a hardcopy overlay, oF 
‘on a scanned image. Satellite images are 
often interpreted using automated ог sêmi- 
automated methods, Classification is a com- 
‘mon interpretation technique that involves 
specifying spectral and perhaps spatial char- 
acteristics common to each feature type. 

The choice of photographs or satellite 
imagery depends on the needs and budgets 
of the user. Aerial images often provide 
more detail, are less expensive, and are eas- 
ily and inexpensively interpreted for small 
areas. Satellite images cover large areas in a 
‘uniform manner, and sense energy across а 
broader range of wavelengths. 

LiDAR data are becoming a widespread 
source of spatial data. Discrete-retum 
LiDAR are prevalent, providing X, Y. and Z 
coordinates for ground and above-ground 
feature returns. Most new, high-resolution 
digital elevation models are based on 
LIDAR data, and building and forest fea- 
tures are routinely extracted from LIDAR. 
Statewide acquisitions are becoming com- 
mon, and system resolution and collection 
frequency are likely to improve through 
time. 


Chapter 6: Aerial and Satellite Images 285 
‘Unmanned aerial vehicles (UAVs), also weighed against limitations in throughput 
known as drones, show promise as spatial and hence area imaged, variability in accu- 
data collection tools. Lower costs, increased тасу, and regulatory uncertainty 
flexibility. and higher details must be 
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‘Study Questions 
61 - Describe several positive attributes of images as data sources? 


62 - What is the electromagnetic spectrum, and what are the principle wavelength 
regions? 


63- Define a spectral reflectance curve. Draw typical curves for vegetation and soil 
through the visible and infrared portions of the spectrum. 


64 - Describe the structure and properties of digital sensors in digital aerial cameras. 
6.5 - What are the basic components of a camera used for taking aerial photographs? 


66 - A camera has a 2.4 cm square digital sensor, at a flying height of 100 meters, 
and a focal length of 35 mm. What is the ground distance, in meters, of one side of. 
the area covered? 


6.7 - A camera has a 32 cm square digital sensor, at a flying height of 80 meters, and 
a focal length of 24 mm. What is the ground distance, in meters, of one side of the 
area covered? 


68 - You are planning a drone mission, and know your camera has a 10 pm sensor 
cell size, and a 34 mm focal length. What flying height should you set if you want a 5 
‘em ground resolution? 


69 - You are planning a drone mission, and know your camera has a 16 um sensor 
cell size, and a 28 mm focal length. What flying height should you set if you want a 3 
cm ground resolution? 


6.10 Describe the most commonly used camera band combinations for aerial pho- 
tography, and their relative advantages. 


6.11 - What are the major sources of geometric distortion in aerial images, and why? 
‘What are other, usually minor, sources of geometric distortion in aerial images? 


6.12 - What are typical magnitudes of geometric errors in uncorrected aerial images? 
How might these be reduced? 
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6.13 - Identify the type of distortion in the view on imaged square grids below: 
a) 


b) ч 9 
үш] Ж! ШЫ! E 
E H 


tH [| 


6:4 - Identify the type of distortion in the view on imaged square grids below: 


o) b) c) 


6.15 - A tall building is recorded on two vertical aerial photographs, the first photo- 
graph at a nominal scale of 1:20,000, the second photograph at a nominal scale of 
1:40.000. The building is near the edge of both photographs, and terran is level 
throughout the photograph. Which image will show a larger displacement, d. as 
shown in Figure 6-23? 


6.16 - Describe stereo photographic coverage, and why it is useful. 
6.17 - What is parallax, and why is it useful? 

6.18 - Describe the basic process of terrain distortion removal. 

6.19 - What is the displacement, d. in millimeters, shown in figure Figure 6-23, given 
the following conditions: p = $0 mm, h=40 meters, H= 1,000 meters (there are 1,000 
mm to one meter). 

6.20 - What is the displacement, d, in millimeters, shown in figure Figure 6-23, given 


the following conditions: р = $0 mm, һ=15 meters, H= 300 meters (there are 1,000 
mm to one meter). 
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621 - Why do the buildings lean in different directions in the images below? 


6.22 - What is photointerpretation, and what are the main image characteristics used 
during interpretation? 


6.23 - How are images from satelite scanners different from photographs? How are 
they similar? 


6.24 - Why is relief displacement usually less for satellite systems than for aerial 
camera images. 


625 - What is a LIDAR? What type of information can LIDAR produce? 


626. What are three criteria used in selecting the type of images for spatial data 
development? 
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7 Digital Data 


Introduction 


Many spatial data currently exist in digi- 
tal forms. Roads, political boundaries water 
bodies, land cover, soils, elevation, and a 
host of other features have been mapped and 
‘converted to digital spatial data for much of 
the world. Because these data are often dis- 
tributed at low or no cost, these existing digi- 
tal data are often the easiest, quickest, and. 
least expensive source for much spatial data 
(Figure 7-1). 


population 


7-4: Examples of Eee digital dta available at э, 
аа таре data 


refer (idl an ааа 


Data are increasingly collected in digital 
formats. GNSS, laser measurements, and sat- 
Ше scanners all provide primary data in 


digital forms. They are directly transferable 
to other digital devices and GIS systems, 
where they may be further processed. Direct 
digital collection should reduce transcription 
errors and help maintain source and process- 
ing history. 


‘digital graphic — 


of themes, extents, and scales Vector ей). 
are shown for Каша. Hawai, USA. 
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Digital data are developed by govem- 
‘ments because these data help provide basic 
public services such as safety, health, trans- 
portation, water, and energy. Spatial data aid 
disaster planning, national defense, and 
infrastructure development and mainte- 
nance. Many national, regional, and local 
governments have realized that once these. 
data have been converted to digital formats 
for use within government. they may also be 
quite valuable for use outside government. 
Business, non-profit, education, and science, 
may draw benefit from the digital spatial 
data, as these organizations benefited in 
prior times from government-produced 
paper maps. Some data commonly available 
throughout the United States and the world 
are described in this chapter. 


Map Services vs. Locally Stored 
Data 


‘We must distinguish between data that 
are available for transfer to, storage оп. and 
‘manipulation in a local computer (locally 
stored), from those data that are available as 
Web services, including Web Mapping Ser- 
vice (WMS), Web Feature Services (WFS), 
and Web Coverage Services (WCS). Digital 
data were fist distributed on physical media, 
then via the Intemet, bit typically as elec- 
‘tonic files that were copied onto a local 
storage device for use. You maintained a 
copy on your device, and manipulated those 
locally stored data. A WMS eliminates the 
need fora local copy. 

A Web service is a standard way of serv- 
ing geographic data over the Internet. GIS 
software access data via an Internet connec- 
tion and display these data on a local 
‘machine, although they are "served" from. 
some remote computing system. Image data 
are most often served, but vector data may 
also be provided, usually in the form of a 
georeferenced map backdrop. The data don't 
reside on the local hard disk and are deliv- 


ered in response to each pan, zoom, or other 
change in display. 

‘There are many differences among 
WMS, WES, and WCS, but in broad strokes, 
a WMS is for serving ic data for 

and displaying maps, while WFS 
(vector) and WCS (primarily raster) deliver 
data and metadata in ways that ease spatial 
processing and analysis. Details of the dif- 
ferences are specified in standards docu- 
‘mented by the Open Geospatial Consortium, 
‘www opengeospatial org/docs/is. Most data 
through services are curently provided as 
WMS, with few systems supporting and 
using the editing analysis functions available 
through WES and WCS. 

Web services are better and worse than 
local data storage. Web services save space 
on the local bard rive, and only the portion 
of interest from а large data set need 
accessed. The most up-to-date information 
provided to a wide set of users, Many differ- 
ent kinds of data may be joined together 
more easily, as accessing Web services typi- 
cally requires a few mouse clicks. However, 
you may often not manipulate or change 
‘WMS data in any substantial way, and some 
kinds of analysis may not be supported ог 
allowed. In these cases, local copies of the 
data may be required, or WES or WCS 
developed. Map services may also require a 
fast and reliable interet connection, 
larly for large raster data sets. 

For the remainder of this chapter, we 
will primarily focus on data available for 
download, as these have the fewest barriers 
to use in analysis. Through time, many of 
these data may be offered via Web services, 
and software will ease use of Web-served 
data in analysis. 

This chapter introduces an alphabet 
soup of acronyms, many of which are famil- 
iar to GIS practitioners. We include a list at 
the end of this chapter as a reference. 
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Global Digital Data 


Global data ses are available, generally 
at coarse to medium resolution. and often 
with idiosyncratic completeness, depending 
оп the data layer. Global data are variable 
because few governments collet spatial data 
in the same way or with the same Set of atri 
butes. Different govemments specify differ- 
еш datums, standard map projections, data 
variables, and attributes, or have different 
requirements for survey accuracy or mea- 
surement units. Data reduction or documen- 
tation methods may be different across 
national boundaries. There is substantial 
work in reconciling differences across 
national boundaries, therefore, global data 
sets are only occasionally built from a com- 
posite of national data sts. The collection of 
Natural Earth data sets 
(htp:/www.namuralearthdata con) isa good 
example of useful, global, homogenized 
data, It is a volunteer collaboration for creat- 
ing consistent, high-quality data suitable for 
small- to medium-scale mapping (Figure 7- 


2), and includes many physical, cultural, and 
natural data layers. 


‘Satellte-based data perhaps dominate 
data sets with uniform global coverage, 
because data may be collected over large. 
areas using one platform and processed 
using a standard set of methods, For exam- 
ple, world-wide vegetation characteristic 
data have been developed such as the Land- 
sat, MODIS or VEGETATION canopy cover 
at 30 meter through eight kilometer cell 
sizes. Using a uniform global data source. 
avoids the problem of reconciling differ- 
ences among disparaely collected data sets, 
but substantially reduces the number and 
type of global data sets that may be obtained 
A limited set of data may be derived 
from satellite images. These features of 
interest must be visible from satellites, and 
there must be an organization interested in 
collecting and processing global data. 
NASA and the European Space Agency 
(ESA) provide a large and diverse group of 
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global spatial data sets, due to their leader- 
ship inthe development and application of 
satellite images at a range of spatial scales. 
NASA operates DistribuledActive Archive 
Centers (DAACS), central nodes for data 
access, in addition to project-specific sites 
for data distribution, e.g., Landsat satellite 
and derived data (Figure 7-3). Global raster 
data sets include elevation, land use, ecosys- 
tem type. and a number of measures of vege- 
tation productivity, phenology, structure, and 
health. 

University centers or ad hoc collabora- 
tions are other rich sources of global data. 
‘One example is the Center for Intemational 
Earth Science Information Network, admin- 
istered by the Earth Institute at Columbia 
University (www. ciesi org) It seeks to 
provide global data to better address envi- 
ronmental problems. Another example is 
the Global Forest Watch at the University of 
Maryland (http:/glad.umd.edu projects 
global-forest-watch). 

Global spatial data sets are often orga- 
nized around a theme. For example. the Max 
Planck Institute of Environment and Climate 

n Germany has led an effort to create a grid- 
ded data set of historical global precipitation. 
by combining data from 40,000 meteorolog- 


ical stations in 173 countries. These data are 
compiled, quality checked, and processed to 
create gridded data sets for normal precipita- 
tion. Data sets of annual anomalies, the num- 
ber of gages, and systematic error are also 
provided. This was an expensive and time- 
consuming undertaking due to the number of 
different methods used to collect and report 
precipitation. Considerable time was spent 
reconciling data collection methods and 
results, A more complete description of 
these data is found at http://gpcc. бн бе 
There is a similar effort for undersea topog- 
raphy in the Global Multi-Resolution Topog- 
raphy synthesis (GMRT), combining new 
‘ocean multi-beam sonar data with previous 
‘ocean depth data to continuously update 
ocean topography. 

Global data may also derive from a spe- 
cific space-based mission or platform, e.g. 
Tropical Rainfall Measuring Mission 
(TRMM) or the Shuttle Radar Topography 
Mission (SRTM). TRMM involved satellite- 
based measurements of rainfall in tropical 
and sub-tropical regions from 1997 to 2015. 
It substantially increased our catalog of 
‘observations in previously sparsely 
sured regions, and the data are accessible as 
rasters of various metrics, including mean 


Figure 7.3: Global forest canopy cover derived fom NASA Landsat кае, (courtesy NASA). 


and variation in precipitation by various time 
periods. a 

SRTM was the first program to uni- 
formly measure elevation worldwide, 
attempting complete coverage. Two radar 
signals were used, C- and X-bands over an 
11-дау period in February, 2000, mapping 
more than 99% of the land between 60N and 
605 latitudes. Data are processed to measure 
heights а various horizontal and vertical res- 
olutions, down to 30m. 


Global Spatial Data Infrastructure. 


Given the substantial difficulties in com- 
piling data from disparate global sources, the 
Global Spatial Dataset Infrastructure (GSDI) 
initiative was formed. GSDI is an attempt to 
coordinate collection and processing meth- 
ods worldwide to ensure that spatial data are 
broadly suitable for global-level analysis. 
The primary goal is to improve the develop- 
ment, use, and sharing of spatial data across. 
the globe. This will be achieved through the 
adoption of common standards and comple- 
mentary policies across governments and 
regions. 

‘The GSDI initiative began in the late 
19905, and is still a work in progress. Activi- 
ties during the first few years include identi- 
fying participants, developing goals and 
organizational structure, and identifying and 
prioritizing early actions. Activities on the 
GSDI initiatives may be found at 
www gsdi.ong. 

‘The Global Map is one early GSDI ini- 
tiative. The Global Map specifies common 
thematic layers: boundaries, elevation, land 
cover, vegetation, transportation, population 
centers, and drainage. Scale, feature classes, 
feature types. and feature names are 
specified, as are attributes, metadata, tiling 
schemes, and delivery mechanisms. Coun- 
tries submit data to the Global Map project, 
which then serves as a distribution node. 
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Open Street Map 


OpenStreetMap (OSM) is one notable 
«боп to develop global data through interna- 
tional volunteer collaboration (Figure 7-4). 
Much like Wikipedia, this is an open access, 
‘user-generated resource. Individual users 
register and can check out data sets to mod- 
ify. Roads and other transportation infra- 
structure are digitized, typically from image 
interpretation or via GNSS, and submitted 
for database integration. As with many 
online collaboratives, there are protocols for 
review and resolving conflicts, and data may 
be downloaded in various formats from. 
OpenStreetMap or companion sites. These 
че often the best data in areas with poorly 
developed mapping infrastructure, 


While OpenStreetMap provides the best 
ata in many regions, there are potential 
drawbacks with these data. Because itis a 
collaborative effort, documentation and uni- 
formity may be lacking. A range of sources, 
abilities, and methods may be used to 
develop data, and documentation on these. 
sources may beunavailable In addition, data 
may not be complete, depending on how 
much volunteer effort has been directed at an 
area, and the pace of change. Given these 
‘drawbacks, the data should be verified for 
accuracy and completeness, or at least suit- 
ability for the intended use, prior to adop- 
tion. This is true for all data, the burden 
perhaps falls more heavily on the user with 
crowd sourced data. Given the richness of 
detail of OSM data, it is well worth ће 
effort 

Another perhaps slight barrier to use 
may be the method of distribution. Cur- 
тешу, the data may be downloaded from the 

y website in a well-defined but litle 

used data format. Data are available in more 
standard formats from 3rd party services 
websites, and the native OpenStreetMap for- 
mats are supported by some softwares (e. 
QGIS). and will surely achieve broader sup- 
роп in the future. In spite of these potential 
drawbacks, these open-source collaborations 
have a bright future, and may well become 
standard for many types of data. 
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An example of 


Other General Distributions 


‘There are several global image archives, 
often focused on a specific satellite platform 
ог initiative. For example, the Landsat sys- 
Tem described in the previous chapter las. 
been collecting data since the early 19705. 
The LandLook initiative allows а global 
search for these data back to inception. 
LandLook also supports Sentinel data. 
search, browse, and download. Similar 
archives exist for SPOT and other long-run- 
ning, government funded platforms. 

NASA hosts diverse set of data through 
their Land Processes Distributed Data 
Active Archive Center They include various 
global digital elevation data sets, among 
them the Shuttle Radar Topography Mission 
(SRTM) and the ASTER satellite global ele- 
vation data. providing uniformly processed, 
worldwide elevation rasters at various cell 
sizes, with distributions of 60, and 100 meter 
versions. There is a comprehensive archive 


14 ОребиееМар data for a area in Gain porbvesem: 
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of layers derived from MODIS satellites, 
including global vegetation density and type, 
forest cover change, phenologies, and vari 
ous physical measures, such as albedo and 
surface reflectances. 

ESRI Open Data is another rich source. 
of global data, containing a broad range of 
categories. In early 2019, over 115,000 dif- 
ferent data sets were hosted, to view and 
download, depending on permissions, Data 
are available Гог political boundaries, 
demography. education, health, agriculture, 
economic variables, natural resources, and 
other categories. Metadata, links to down- 
loads and programmer's display interfaces, 
and webmap production are provided. While. 
the collection is U.S. weighted, there are 
‘many international data sets. 

The NASA funded Socioeconomic Data. 
and Applications Center (SEDAC) provides 
substantial data with a global focus, Popula- 
tion, urban land cover and characteristics 


Figure 7-8; An example distribution duhboard 
foal fe ow the or Popa eo 
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global agricultural lands and food supply. 
roads, environmental resources use and sus- 
tainability are among the data sets served. 
‘Most data involve a combination of remote 
sensing and other data collection efforts in 
global or regional estimation, hence the sup- 
роп by NASA. Data are free to download in 
various standard formats, with metadata and 
development methods defined. 

The United Nations Environment Pro- 
gram (UNEP) distributes over 500 global 
data sets through the Environmental Data. 
Explorer. Data are searchable, provided at 
national, UNEP regional, ог subregional 
units of area, and downloadable in standard 
formats. Data focus on environmental 
themes, broadly defined. 


‘Terra Populus is another global spatial 
portal. focusing on integrated popula- 
tion and environmental data. It provides 
matching data downscaled to compatible 
areas, including land use, land cover, and cli- 
mate data (Figure 7-5). 

‘There are many other general distribu- 
tion sites serving global data, both general 
and specific. These аге best discovered in 
the subject matter literature, and via broad 
web searches. 
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National, Provincial/State, and 
Local Digital Data 


Governments at a range of levels com- 
monly develop. organize, archive, and dis- 
tribute national data sets. Many national 
governments organize and distribute spatial 
data (Figure 7-6). The standardization of 
‘weights and measures is a primary function 
‘of most national governments, and spatial 
data may be viewed as measurements of 
land, sea, or other national territories. Gov- 
eruments must oversee the planning, con- 
struction, and management of public. 
infrastructure such as roads, waterways, and 
power distribution systems, and these activi- 
ties, among many others, require spatial data 
sets that are national in extent 


States and provinces, counties and 
departments, cities, and other minor civil 
divisions often generate and distribute spa- 
tial data. These data are often considered 
рап of the public domain, as state resources. 
have gone into their development. In gen- 
eral, the grain size decreases along with 
extent such that city-level data are often the 
most detailed, with the smallest area cov- 
ered, 


А partial list of available data resources 
is included in appendix В, near the end of 
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Digital Data for the United States 


National Spatial Data Infrastruc- 
ture 


The United States has defined the 
National Spatial Data Infrastructure (NSDI) 
аз the policies technologies, and personnel 
required to ensure the efficient sharing and 
use of spatial data. The goal of the NSDI is 
to reduce duplication of effort among agen- 
cies, to improve quality and reduce the costs 
of geographic information, to make geo- 
graphic data more accessible to the public, to 
increase the benefits of available data, and to 
tablish key partnerships with states, coun- 
Чез, cities, tribal nations, academia, and the 
private sector to increase data availability 
(www fade gov). 

‘The NSDI has identified core data ses, 
including geodetic control, othoimagery, 
elevation, transportation, hydrography. 
cadastral data (property boundaries), and 
‘governmental unit boundaries. A main NSDI 
goal is the efficient development and distri- 
bution of these core data. 


The NSDI advocated parallel access to 
many data sets across a range of government 
agencies. Geoplatform gov is one result of 
US. federal government efforts, providing 
shared geospatial data, web services, and 
applications. The U.S. Geological Survey 
(USGS), htp;/vww.usgs.gov', is another 
‘good source of geospatial data from the U.S. 
federal government, and many of these data 
will be described in the following sections. 


The U.S. National Map. 


Digital data are available for most of the 
United States, through the National Map 
project, described as а cornerstone of US. 
‘mapping efforts (htp:/nationalmap.gov, and 
apps nationalmap.govidownload). Data are 
provided on political and civil boundaries, 
‘transportation, hydrography. geographic 
ames, structures (ер. dams, notable build- 


ings, towers, or monuments), elevation, 
aerial photographs, and land cover. Some of 
these data are available from other dedicated 
projects and websites, for example, elevation. 
datasets and the National Hydrography 
Datasets (USGS NHD program and web- 
site), We'll discuss these two data sources in 
detail in later sections of this chapter, and 
here focus on a general description of the 
‘National Map project and the additional data 
available through the national тар. 


Data for the National Map come from a 
variety of sources, including new primary 
data collections from aerial and satelite 
images contributed by federal and state 
agencies, and older data. Much of the 
National Map data are legacies of USGS 
hardcopy mapping programs. The USGS 
began topographic mapping in the United 
States in the 1880s, creating paper maps that 
each covered 7.5 minutes of arc on a side, 
and comprised about $5,000 tiles for the 
lower 48 states. Currently. these digital topo- 
‘graphic maps are delivered as geographi- 
cally enhanced portable document format 
(PDF) files, ог GeoPDF s, with layers for 
ortboimagery, roads, place names, elevation 
contours, and rivers lakes, and other hydro- 

ic features. Layers may be rendered 
ble or invisible, and the maps displayed 
with other georeferenced data in appropriate 
viewers, usually as a background display, 
and not for analysis. 

The raster and vector data used for 
 GeoPDFsor similar vector data are available 
from the National Map. Downloadable data. 
include elevation, aerial images, hydrogra- 
phy, transportation, boundaries, and geo- 
graphic place names, among others. 


Digital Elevation Data 


Digital elevation data are available at 
national to local extent from a number of 
sources. Data are most often delivered as 
digital elevation models (ЕМ) that pro- 
vide elevation data in a raster format. These 
are available at 10 m or better resolution for 
all states but Alaska, and ace used 
in dana analysis and display (Figure 7-7). 
ipulations and terrain analysis are 
б Кдан! в Chapel but for 
now, know they are among the most used. 
spatial data sets in many endeavors. Here we 
introduce the sources of U.S. DEM data and 
their basic characteristics 


Ground and aerial surveys are the pri- 
mary source of original elevation measure- 
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ments for most DEMS. Traditional distance 
and angle measurements with surveying. 
‘equipment were used up until the 19405 to 
provide precise elevations at specified loca- 
tions. Because these methods are relatively 
slow. they provided a sparse network of 
points, with a dense network suitable for ele- 
vation mapping over only small areas. 
Improved electronic distance meters helped, 
as did global positioning system technolo- 
gies, but even with these improvements in 
survey speed and accuracy. these technolo- 
gies are too slow to be the sole elevation data 
‘collection method over ай but the smallest 
areas. 


Aerial images and airborne LiDAR sur- 
veys complement field surveying by increas- 
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ing the number and density of measured 
elevations. Accurate elevation data may be 
collected over broad areas with the appropri- 
ate selection of aerial mapping technologies. 
From the 1950s until the late 19905. most 
elevation data were compiled using precise 
‘mapping aerial cameras, complemented by 
optical ground surveys. Since the late 1990s, 
LIDAR mapping has been combined with 
GNSS to more accurately and rapidly map 
elevation. These various GNSS and survey 
methods are discussed in Chapter 5, and 
aerial images and LiDAR in Chapter 6. 
Laser-based elevation mapping and DEM 
generation are now common. and will be 
used for the foreseeable future to create the 
highest-resolution DEMS. 

While LIDAR coverage extends each 
year, many areas have not been flown. 
Where LIDAR DEMS are not yet available 
in the United States, most РЕМ have been. 
developed using metric aerial photographs. 
Photogrammetry has been used since the 
1930s to map lines ofa constant elevation 
(contours) and spot heights, and data have 
‘been developed for much of the Earth's sur- 
face using these techniques. 

DEMS with 3-, 10-, and 30-meter hori- 
zontal sampling frequency are available for 


Figure 7.8: The extent of mountaintop, 
ison of older NED data ей) and year 2000 SETM 
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much of the United States. Currently, the 
USGS delivers these DEMS as part of the 3D 
Elevation Program (3DEP), through the 
USGS National Map portal. Data are avail- 
able ata 30-meter resolution for almost all of 
the United States, and 10-meter resolution 
for the lower 48 states and Hawaii, and at 1 
and 3 m fora large and expanding area. The 
‘underlying LIDAR data are also available 
for download. Note that global 30 m SRTM 
and Aster data, described in the global data 
section of this chapter, are available for the 
United States, but are generally inferior to 
the 10 m or better resolution data where both 
exist. There is a special 5 m radar-based data 
set restricted to Alaska. 

GIS users should be cautious because 
there are several versions of DEM data for 
many areas, and they should generally use 
the more current, higher accuracy, or higher- 
resolution data. The existence of various ele- 
Vaton data sets, covering various time peri- 
ods, does provide the ity to monitor 


opportuni 

change through time, for example, broad- 
scale mining modifications in Kentucky 
(Figure 7-8). The most current USGS data 
are best accessed through the USGS Earth 
Explorer or National Map portals. 


in viet in tie 
colored valley 
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Hydrographic Data 
The National Hydrography Dataset con- 


tively call NHD data bere, including ver- 
sions labeled NHD, NHDplus, and tbe in- 
progress NHDplus HR. 


Naturally occurring and built features 
are represented in NHD data (Figure 7-9). 
These include rivers and streams, water- 
sheds, water bodies, canals, pipelines, dams, 
and other natural ог control structures. Atri 
butes may be provided for these features. for 
example, a lake type or name, if a dam is 
earthen or concrete, or ditch type. Features 
may be points, lines, or polygons. 

NHD data also represent network topol- 
ору, the connection among stream features, 
and include information on connections and 
flow directions Line segments have a desig- 
nated flow direction, and connections or 
crossings may be represented as full connec- 
tions, or noted as a bypass, for example, 


Pips 


Figure 79: An example of esas data obtained from the National 
feature types are represented, melding etchmens, water Пот шил. and kes. 
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‘when a spillway or pipeline crosses a river 
‘without the possibility of discharging water 
into the river. Coding schemes have been 
developed to identify each reach in the. 
hydrographic network, and to represent net- 
‘work connections among reaches. 

NHD data are organized by areas, in a 
hierarchically nested set of Ziirologic. 
Units, identified by unique codes (HUCs). 
These units correspond to watersheds, or 
basins, or logical aggregations or subareas of 
‘watersheds (Figure 7-10). The United States 
‘was divided into 21 regions, and these 
regions further divided into 222 subregions. 

as were in tra divided, forming a 
total of 352 hydrologic units, and these are 
further divided into 2,150 hydrologic units. 
This fourth level division is for the most part 
along major river basins, outlining distinct 
watersheds, or intermediate pieces 
the main stem of larger rivers. Each ofthese 
divisions is identified by a unique eight digit 
‘code, and 30 these areas are also known as 
HUC-S boundaries. Regions, subregions, 
‘and subregion divisions are known as HUC- 
2. HUC-4, and HUC-6 areas, respectively, 


Dataset, A number of 
dia are provided 


fou спине average ow, catchment area. and othr аро 
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Cape Fear River 
Watershed, 
North Carolina 


ит 7-10: An example of nested HUC drainage ares for a portion ofthe Саре Fear River in North Caro- 
Fiu ample: аш portion of the Cape 


lia Higher HUC desigastors nest within lower’ 


providing а nested set of drainage areas for 
areas larger than the HUC-$ catchments 
HUC-8 catchments may in tum be split into 
smaller HUC-10 and HUC-12 catchments, 
this last size typically the smallest delinea- 
tions widely available. 


‘The United States EPA also provides 
data on waters and watersheds of various 
types and formats, organized to correspond. 
to the HUC data at some levels. EPA River 
Reach Files organize data in a series of ver- 
sions, from RFI through КЕЗ data. КЕЗ data 
are designed to provide a nationally consis- 
tent hydrographic database that records 
geography and assigns unique identifiers to 
all surface water features. It allows the 
hydrologic ordering of reaches so that larger. 
rivers and segments may be accurately 
defined, along with river connectedness and 
flow direction. КЕЗ data also record the 
locations and characteristics of additional 
elements, including gages, dams, and other 
hydrologic features. 

River reach data are precursors to NHD 
data, and зо contain much ofthe same base 
information. RF files are available for most 
ofthe contiguous United States. Tabular data 
‘on water chemistry and other watershed 


SS HUC-12 wem ме, ith HUC-S 


characteristics are available at bnp./ 
cfpub.epa.govisurf'locate/ndex.cin. 

There are other improved hydrologic 
data, called NHDPlus and NHD Plus HR. 
Managers need consistent elevation, stream, 
and watershed boundary data sets at high 
resolution to solve many water resource sci 
ence and planning problems. The original 
NHD produced in the early 2000s focused 
оп hydrographic data from 1:100,000 scale, 
Subsequent work has focused on improving 
the accuracy, consistency, and tools o sup- 
роп NHDPlus data, and with subsequent 
‘versions using improved digital elevation 
data and enforcing consistency with other 
data sources. 


There is an emerging system for storing, 
finding, and retrieving hydrologic data asso- 
ciated with the CUAHSI project 
(wwwcuahsiorg). CUAHSI is a National 
‘Science Foundation funded project involv- 
ing more than 120 universities to support 
hydrologic science and education. CUAHS 
HIS is an internet-based system for sharing 
hydrologic data via a Web service. As noted 
earlier, a Web service is a set of protocols 
that allow communication among computer. 
programs over the internet. In GIS these 


Web services are most often used to stream- 
line data sharing and access. The CUAHSI 
HIS is designed to aid the integration and 


‘There are other hydrologic dan sets 
available for the United States, more gener- 
alized, and most often used for analysis ог 
display over larger areas. The USGS pro- 
duced digital, nationwide data sets based on 
paper 1:100,000, 1:250,000, 1:1,000,000, 
and smaller-scale maps (Figure 7-11), and 
these are available from various state. 
national, local, and private sources. These 
data show larger rivers and a limited set of 
ibutes for each river, most importantly 
river names. These data are also not hydro- 
logically continuous, in that many of the riv- 
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rs do not maintain their connection through 
Water bodies. Despite these limitations, they 
are often used because they may be more 
appropriate for statewide or regional analy- 
sis involving only the main stems or larger 
rivers in a region. 

Integrated, consistent, continent 10 

worldwide data are also avail- 
able from the “Natural Earth” data projects 
(www naturalearthdata com), These data are 
intended for use in cartography. and not pri- 
marily for analysis, as they have been gener- 
alized and made consistent primarily for 
display rather than geographic accuracy. 

Different hydrographic data are offered, tar- 
geted at a range of small scales, for regional 
through global mapping. A limited set of 
attributes is available, including reach ог 
river names and cartographic widths. 


1.41: As мине of USGS ерме 11,000.00 
Figure 711: An example of USGS legacy 11 00 


(Get right) nada portion in east-central 


диз, with the national dat set 
depicting de level of detail in tbere dana. 
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High-Resolution Digital Images 


Digital images are availabe from a 
range of sources, including national, state, 
and county governments, or from private 
‘contractors, satellite imaging companies, 
and resellers. High-resolution digital image 
data are typically collected every five to ten 
years by the USGS, in partnerships with 
States or other government agencies. Image 
archives go back to the 1950s for most of he 
United States, with state and countywide 
coverage as far back as the 19305. Nation- 
wide coverage was completed in the 1980s 
and 1990s through the NHAP and then 
NAPP programs, at scales in the 1:40,000 10 
160,000 range. These formed а primary 
basis for digital orthophoto quads, or DOQs, 
the first orthographic, high-resolution digital 
images with national coverage. Most other 
images from before the early 2000s were. 
film-based, although some have been 


scanned and are available from the United 
‘States EROS Data Center. Image sets 
include the historical black and white aerial 
photographs, nation-wide programs of the 
1980s and 1990s, high-resolution coastal 
images, radar, and other special collections. 
The High-Resolution Orthophotographs 
(НК Оз) series are among the highest resolu- 
tion, widely available image data sets. The 
НЕО images are collected and distributed 
through the USGS with current coverage for 
about 1/3 of the U.S. These data are often 
collected at 0.3 m (1 ft) resolution, and at 
times up to 10 cm (4 in) resolution. Because 
they are orthophotographs, object base loca- 
tions have been corrected for tilt and terrain 
distortion at ground height. Towers, build- 
ings, bridges, and other tall objects often 
appear to tilt, as these structure heights 
above ground are not corrected on the 
images (Figure 7-12). These images are use- 


Жей the stame and shadow 1o the lower righe. 


ful for infrastructure mapping, planning. 
disaster management, and many other appli- 
cations. The images are sometimes used for 
vegetation mapping. but these images are 
most often collected during leaf-off periods, 
and so must often be complemented by 
images taken during lea-on periods for most 
Vegetation mapping efforts, 

The HRO and other high-resolution 
images are valuable sources of spatial data. 
These images are typically processed to 
within а few pixels of the delivered resolu- 
tion, for example, typically accurate to 
within a half-meter for tbe 0.3 m data. 
Because these and other photos described 
below record the surface at a fixed point in 
time, they may be used to create new maps 
orto monitor change (Figure 7-13), 

‘Commercial satellite data vendors are 
another source of higher-resolution images. 
Images in the 30 cm to | meter resolution are 
available, from Planet, Eos, Maxar, and oth- 
ers as described in Chapter 6, 
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NAIP Digital Images 
The National Agriculture Imagery Pro- 
gram (NAIP) acquires photographs during 
the growing season in the continental United 
States. NAIP images are distinct from the 
previous HRO, NHAP. NAPP, and DOQ 
programs because NAIP із primarily for one 
‘purpose — to monitor agricultural land- 
scapes. NAIP photographs are typically 
acquired during the fll-lea period for local 
crops, so the bulk ofthe ima ges are collected 
from June through August, in contrast to 
other photographic programs, which were 
often taken during leaf-off conditions. In 
addition. the NAIP photographs typically 
have a yearly repeat cycle, while other. 
Sources are often spaced at five-year or lon- 
ger intervals. NAIP photographs may be 
‘obtained in hardcopy or digital formats, 
‘commonly as county mosaics. Data may be 
Viewed from within a GIS ing a publicly 
accessible web service. where 
stored centrally. rather than on a local com- 
puter disk, as described earlier in this chap- 
ter. These data may then be used as a 
backdrop for digitizing. wherein the analyst 
extracts information through a visual classi- 
fication of spatial features. 


‘Figure 7-1: An cape of a historical aal photograph frons the 1940, d) and 2008 (gh) for an 
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Images are most often collected as natu- 
ral color, digital aerial photographs, 
although sometimes infrared bands are also 
collected. NAIP images are orthorectified 

provided at 1- and 2-meter ground reso- 
lutions, with corresponding horizontal accu- 
racies at $ to 10 m. Data are typically 
provided in an NADS3 UTM coordinate sys- 
tem corresponding to the image area 


NAIP images are most useful as a base 
for digitizing, particularly when information 
‘on vegetation type or condition is important 
Figure 7-14). Leaf-on NAIP images are 
‘more useful for mapping vegetation, because. 
differences are most often expressed in the 
color, brightness. and texture of foliage. 
‘While the natural color images typically 
used for NAIP images are inferior to infrared 
images, substantial information on vegeta- 
tion can be collected, and sometimes the 
NAIP images include an infrared band. This, 
plus the annual image collection cycle, make 
these images a valuable source of spatial 
data. 


National Land Cover Data 


While land cover is important when 
managing many spatially distributed. 
resources, data on land cover are quite 
expensive to obtain over large areas, These 
data are often scarce, at low categorical ог 
spatial resolution, and rarely available over. 
broad areas. While individual states, coun- 
ties, metropolitan areas, or private landhold- 
ers have developed detailed land cover 
maps, there have been few national efforts to 
map land cover in a consistent manner, 
There are four consistent national data sets 
available, based on satellite data from a con- 
sistent set of categories, and a legacy data set. 
from the 1970s and early 1980s, 

The National Land Cover Database 
(NLCD) is the most recent and detailed 
source of national land cover information. 
NLCD versions are produced in a coopera- 
tive effort by a number of United States fed- 
eral government agencies, under the Multi- 
Resolution Land Characteristics (MRLC) 


lind wie stewart ood 


7-44: An example NAIP image showing wetlands (A). labes (В), forest (С). and residential arent 


Consortium, so these are sometimes refered 
1o as MRLC data. The consortium’s goal is a 
consistent, current land cover data record for 
the conterminous United States. NLCD data 
Were created in 1992, and then with an 
expended set of categories: 

every two to three years from 2001 through 
2019. Plans are to continue production into 
the бише (Figure 7-15). 

NLCD land cover classifications are 
based primarily on 30 m Landsat Thematic 
Mapper data. NLCD 1992 land cover was 
assigned to one of 21 classes. Full coverage. 
is obtained from adjacent or overlapping. 
cloud-free Landsat images. Multiple dates 
are often acquired in order to improve accu- 
racy and categorical detail through pheno- 
logically driven changes. For example, 
evergreen forests are more easily distin- 
fled боп dedos forens when bs 

або and leaf-on images are used. Other 
spatial data sets are used to improve the. 
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accuracy and categorical detail possible 
through spectral data alone. These data 
include digital elevation, slope, aspect, 
Bureau of Census population and housing 
density data, USGS LULC data, National 
Wetlands Inventory data, and STATSGO 
soils data, 

Data are processed in а uniform manner 
‘within each state or region, and a national set 
‘of categories and protocols is followed. АЙ 
classifications were subjected to a standard- 
ized accuracy assessment, and reported and 
delivered in a standard format, Accuracy 
assessments were based оп МАРР or other 
medium- to high-resolution aerial photo- 
graphs or, in later versions, with high-resolu- 
tion satellite data. Areas were stratified 
based on the images, and sampling units 
defined. Photointerpretations of land cover 
were assumed true, and compared to NLCD 
Classification assignments. Errors were 
noted and reported using standard methods. 


m" 15: A time series of National Land Cover Data (NLCD) for an area north of Las Vegas, NV: 
CD categorizes land cover into 21 classes, andare provided in a 30-ке raster cell format. а this 
arbanizaon shown ш the spreading tones of darker red wath apcultral ares ш 
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NLCD 2001 analysis was refined to 
yield more categories, higher accuracy, anda. 
‘nore uniform classification, Landsat data 
from three periods, digital elevation data, 
population density, road locations. NLCD. 
1992, and city lights data were used (Table 
7-1), and previous data recoded to create а 
consistent time series. The base data were. 
also used to estimate percent impervious sur- 
face, and tree canopy density. NLCD 2006 
and 2011 again refined methods, incorporat- 
ing information on previous classifications, 
‘maintaining categories but improving accu- 
тасу and uniformity. 

The MRLC program also produces other 
regional to nationwide data streams, The 
Rangeland Condition, Monitoring, Assess- 
‘ment, and Projection data (RCMAP) provide 
parent cover for each 30 m pial of shrub, 

лге ground, sagebrush, and annual herba- 
ceous vegetation for westem U.S. range- 
lands. Tree canopy data show U.S.-wide tree 


canopy cover, and an Urban Impervious 
product shows the percent of rainfall-imper- 
vious surface for each classified urban pixel, 
important in estimating runoff and flood 
hazard 


NASSCDL 


The National Agricultural Statistics Ser- 
vice (NASS) produces yearly Cropland Data 
Layer (CDL) data, land cover maps that 
focus on distinguishing major crop types and 
rotations (Figure 7-16). These data are cre- 
ated from a combination of existing land 
cover data for nonagricultural lands, multi- 
date images from mid-resolution satellites 
such as Landsat and Resourcesat-1. coarse 
resolution but higher frequency MODIS data 
for phenological discrimination, and various 
‘vector data layers to improve classification 
accuracy. 


‘Table 7-1: NLCD 2011 land cover classes. Classes have varied slightly through 


versions. 


Water 
11 open water 
12 perennial ice/snow 


Developed 
21 developed, open space 

22 developed, low intensity 

23 developed, medium intensity 
24 developed, high intensity 


Barren 
31 bare rock/sandiclay 


Forested Upland 
41 deciduous forest 
42 evergreen forests 
43 mixed forests 


CDL land cover classification is based 
on extensive field surveys conducted by the 
United States Department of Agriculture. 
Fields are visited, airphotos obtained, and 
fields, farms, and regions classified by domi- 
паш crop types and rotations. Observed crop. 

are compared to spectral data from sat- 
elites, and a classification algorithm devel- 
oped. Classification methods have changed 
since 2002, the year nationwide data became 
available anually. Class assignment accura- 
cies are generally between 85 and 95% for 
agricultural crops. 

CDL data is produced annually for most. 
regions, allowing analysis of trends in plant- 
ing. crop rotations, and harvest. Data may be 
downloaded for regional, statewide, or sub- 
state areas, in standard formats and coordi- 
mate systems. 


i Brown County, Kansas 
2008 Cropland Data Layer 
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While NASS-CDL data are the most up- 
to-date and accurate land cover classification 
for agricultural lands, they have limitations, 
(Classification for nonagricultural lands are 
not as rigorously ground truthed as agricul- 
tural data, and depend on older NLCD clas- 
sifications. Land cover is classified only for 
‘counties with agriculrure, although this is a 
surprisingly large proportion of the country, 
The 30 to 60 m cell size is quite good for 
‘such a large-area classification, but still 100 
Stall for field-level assessments, and is bet- 
ter suited to farm-level and larger analyses. 
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С-САР 


The Coastal Change Analysis Program 
(C-CAP) aims to provide land cover over 
time in coastal zones so that land cover 
change may be mapped for a useful set of 
categories. Much like the NASS CDL, C- 
САР data are developed in close cooperation. 
with the MRLC land cover data, using multi- 
ple satellite systems to map over continental 
‘extent, However the C-CAP data adds an 
expanded set of wetland and near-shore cate- 
gories, focusing on vegetation types and on 
landforms that are particularly important in 
coastal areas (Figure 7-17), 

Coastal areas are mapped опа 
basis, with 10-year intervals from 1975 to 
1996, then in 2001, 2011, and 2016. A five- 
year interval is planned going forward. 
Related data are created and distributed in 
concert with C-CAP land cover, including. 
salt marsh habitat and wetland potential 


National Wetlands Inventory 


Data on the location and condition of 
wetlands are available for much of the 
United States through the National Wetlands 
Inventory (NWI) program. NWI data are. 
produced by the United States Fish and 
Wildlife Service. NWI data portray the 
extent and characteristics of wetlands, 
including open water (Figure 7-18), and are 
available for approximately 90% of the con- 
terminous United States. About 60% of the 
conterminous United States is available in 
digital formats. NWI data were produced 
through the 1970s and 1980s, with an update 
in the 19905. Decodal updates are planned. 


NWI data were produced through a 
combination of field visits and airphoto 
interpretation. Spring photographs at a range 
of scales and types are used. Color infrared 
photographs at a scale of 1:40,000 were 
‘commonly used: however. black and white 
photographs and scales ranging between 
1:20,000 and 1:62.500 have been employed. 


OF C-CAP data, with a more detailed tet of land cover canes than other 
coverage. 


Figure Man camel of aoa 
United States, and provide information on the location and characteristics of 


‘Spring photographs typically record times of 
highest water tables and are most likely to 
record ephemeral wetlands. There is sub- 
stantial year-to-year variability in surface. 
water levels, and hence there may be sub- 
stantial wetland omission when photographs 
are acquired during a dry уем. 

NWI data provide information on wet- 
land type through a hierarchical classifica- 
tion scheme, with modifiers. Wetlands are 
Categorized a part of світе (lake), pal- 
ustrine (pond), or riverine system. Subsys- 
tem designators then specify further 
attributes, to record if the wetland is peren- 
nial, intermittent, littoral, or deep water. Fur- 
ther class and subclass designators and 
modifiers provide additional information on 
wetland characteristics. A shorthand desig- 
nator is often used to specify the wetland 
class. A wetland may be designated 
LIUB2G, as system = lacustrine (L). sub- 
system = limnetic (1), class = unconsoli- 
dated bottom (UB), with subclass = sand (2), 


wetlands inventory (NWT) data. Digital NWI data re available for most 
wetlands 


and a modifier indicating the wetland is 
intermittently exposed (G). 


‘The minimum mapping unit (MMU) is 
the target size of the smallest feature cap- 
tured, Features smaller than the MMU are 
not recorded in these data, NWI data typi- 
cally specify MMUs of between 0.5 and 2 
ha. MMUs vary by vegetation type, film 
source, region, and time period. MMUs are 
typically largest in forested areas and small- 
est in agricultural or developed areas, 
because it is more difficult to detect many 
forested wetlands. MMUs also tend to be 
larger on smaller-scale photographs. The 
MMU, scale, and other characteristics of tbe 
‘wetlands data are available in map-specific 
metadata 


NWI data do not exhaustively define the 
location of wetlands in an area. Because of 
the photo scales and methods used, many 
‘wetlands are not included. Statutory wetland 
definition typically includes not only surface 
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ter, but also characteristic vegetation or 
evidence on the surface or in the soils that 
indicates a period of saturation. Since this 
saturation may be transient or the evidence 
may not be visible on aerial photographs, 
‘many wetlands may be omitted from the 
NWÎ Nonetheless, NWI data are an effec- 
tive too for identifying the location and 
extent of large wetlands, the type of wetland, 
and for directing further, more detailed 
ground surveys. 


Digital Soils Data 


The Natural Resources Conservation 
‘Service (NRCS) of the United States Depart- 
‘ment of Agriculture has developed three dig- 
ital soils data sets. These data sets differ in 
the scale ofthe source maps or data, and thus 
in the spatial detail and extent of coverage. 
‘The National Soil Geography (NATSGO) 
data set js a highly generalized sols map for. 


Figure 7.19, An өлмөй of SSURGO dpi! өй, dua 


sol mapping ши of rebous ely nior sod properties 


the continental United States, developed 
бош small-scale maps. NATSGO data have 
limited use for most regional or more 
detailed analyses, and will not be further dis- 
cussed here. State Soil Geographic 
(STATSGO) data are intermediate in scale 
and resolution, and Soil Survey Geographic 
(SSURGO) data provide the most spatial and 
categorical detail. 

SSURGO data are intended for use by 
land owners, farmers, and planners at the 
large farm to county level. SSURGO maps 
indicate tbe location and extent 
ofthe зой map units within the soil survey 
area (Figure 7-19), Soil map units typically. 
RIPE 
‘These detailed mapping types are called soil 
series. There are approximately 18,000 soil 
series in the United States, and several 
phases for most series, зо there are poten- 


available бош the NRCS. Each polygon represents 


tially a large number of map units. Only a 
small subset of series is likely to occur in a 
‘mapped area, typically fewer than a few 
hundred зой series or series phases. A few to 
thousands of distinct polygons may occur. 


SSURGO data are developed from a 
combination of field and photo-based mea- 
surements. Trained зой surveyors conduct a 
series of field transects in an area to deter- 
mine relationships among soil mapping units 
and terrain, vegetation, and land use. Aerial 
photographs at scales of 1:12.000 to 
1:40,000 are used in the field to aid in loca- 
tion and navigation through the landscape. 
Soil map unit boundaries ae then interpreted 
ошо aerial photographs or corresponding 
orthophotographs or maps. Typical photo | 
Scales are 1:15.840, 1:20,000, or 1:24,000, 
These maps are then digitized in a manner 
that does not appreciably affect positional 
accuracy. Soil surveys are often conducted 
оп a county basis, so county mosaics of 
SSURGO data are common. SSURGO data 
are reported to have positional accuracy no 
worse than 13 m (43 f) for approximately 
90% of the well-defined points when 
SSURGO data are compiled at 1:24.000 
scale. 

SSURGO data are linked to a Map Unit 
Interpretations Record (MUIR) attribute 
database (Figure 7-20). Key fields are pro- 
vided with the SSURGO data, including a 
unique identifier most often related to a soil 
map unit, known as the map unit identifier 
(muid). Tables in the MUIR database are 
linked via the muid, and other key fields. 
Most tables contain the muid field, so a link. 
may be created between the muid value for a 
polygon and the muid value in another table, 
stich as the Compyld table (Figure 7-20). 
This creates an expanded table that may be 
further linked through cropname, classcode, 
or other key fields. These kinds of table 
structures and linkages are discussed in 
Chapter 8. 

Variables include an extensive set of soil 
physical and chemical properties. Data are 
reported for water capacity. зой pH, salinity. 
depth to bedrock, building suitability. and 
most appropriate crops or other uses. Most 
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MUIR data report a range of values for each 
soil property. Ranges are determined from 
representative field-collected samples for 
each map unit, or from data collected from 
similar map units. Samples are analyzed 
using standardized chemical and physical 
methods. 

'STATSGO digital зой maps are also 
available, at a smaller scale and over broader 
areas than SSURGO soil data. STATSGO 
data are typically created by generalizing 
SSURGO data. STASTGO map units are 
larger, more generalized, and do not neces- 
sanly follow the same boundaries as 
SSURGO map units In addition, STATSGO 

contain from one to over 20 differ- 
‘eat SSURGO detailed map units, Each 
STATSGO map unit may be made up of 
thousands of these more detailed SSURGO 
polygons, and many different SSURGO map 
unit types can be represented within а 
STATSGO polygon. STATSGO data provide 
information on some of this variability. Data 
and properties on multiple components are 
preserved for each STATSGO map unit. 


Digital Floodplain Data 

Floods cause billions of dollars in dam- 
age each year in the United States; losses 
‘could be reduced with the effective applica- 
tion of GIS. A first step is the mapping of 
flood-prone areas. The Federal Emergency 
Management Agency (FEMA) develops and 
disseminates flood hazard maps, commonly 
known as floodplain maps (Figure 7-21). 
These maps locate the boundary of areas 
with a 1% or higher annual chance of flood- 
ing. commonly known as 100-year flood- 
plain maps. 

FEMA occasionally updates these 
floodplain maps. The objectives are to 
develop maps of flood hazard via an 
improved process, with better input data, in a 
uniform digital format, and to integrate map 
creation into ongoing local and state govem- 
ment mapping and planning efforts. Updates 
are particularly important given the changes. 
in precipitation intensity and with sea level 
rise due to changing climates, 
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Figure 7.21: An example of FEMA floodplain data for a region оем Morgan, Georgia, USA. 


Floodplain maps are used for a number 
of purposes, chief among them setting flood 
insurance rates. Over 19,000 communities 
participate in the National Flood Insurance. 
Program (NFIP). This supports federal gov- 
ernment to guarantee flood insurance for 
communities with floodplain management 
ordinances. Ordinances reduce flooding 
risks for redevelopment and new construc- 
tion, thereby reducing loses 

Digital floodplain maps are produced to 
define regions within а 100-year floodplain. 
Boundary accuracy may be challenged, usu- 
ally by individual landowners, businesses, or 
‘municipalities, and the proposed adjust- 
ments evaluated and included in the flood- 
plain map if they improve accuracy. 

Maps are most often produced by coop- 
erating technical partners (СТР). Expertise, 
training. and demonstrated capabilities are 
required of each CTP. Best technical prac- 
tices for digital floodplain data development 
are defined by FEMA, and training is offered 
to teach best available methods and increase 
data quality. Protocols for verifying and 


revising maps are defined as part of the data 
evaluation process. 


Climate, Geology, and Other 
Environmental Data 


Other spatial environmental data sets are 
available, including climate, water chemis- 
try, energy resources. Here we provide 
examples, but there are many others. 

The National Climatic Data Center 
(СРС) maintains historical climate records 
forthe United States, and provides their data 
through a Web portal 
(hrtps-/www.ncei ncaa. gov/maps-and- geo- 
spatal-products). Recording stations may be 
selected by various criteria, including geog- 
raphy, measured variables, or length of 
record, Climate data have been converted to 
spatial fields, and are distributed through the 
PRISM initiative (prism.oregonstate.edu, 
see Figure 7-22). 

Mineral resources data are available 
from the United States Geological Survey, at 
hrip/imrdata usps gov. These data include 
maps of basic national geology, as well as 
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722 US. average 
‘ed from 1971-2000, a 
оо the Dated Se tse 
‘tea raster prid (ta бош the PRISM project). 


recep: ese 


spatial and tabular data on specialized 
themes such as mineral deposits, mines, 
claims, smelters and other processing facili 
ties, and energy resources. 

Spatial data are available for a range of 
other environmental parameter. including 
air pollution, pollutant and contaminant dis- 
tribution, and some water pollutants through 
"he Envitonmental Protection Agency. many 
at wwwepa gov dta 


Digital Census Data 


The United States Census Bureau devel- 
‘oped and maintains a database system to 
support the national census. This system is 
known as the Census TIGER system (Topo- 
logically Integrated Geographic Encoding 
and Referencing). The TIGER system is 
used to organize areas by state, county. cen- 
sus tract, and other geographic units for data 
collection and reporting. It also allows the 
assignment of individual addresses to geo- 
graphic entities. The census TIGER system. 
links geographic entities to census statistical 
data on population size, age, income, health. 
and other factors (Figure 7-23). These eati 
ties are typically polygons defined by roads, 
streams, political boundaries, or other fea- 
tures, The TIGER system isa key govem- 
‘ment tool in the collection of census data. 
TIGER also aids in the application of census 
data during the apportionment of federal 
govemment funds, in congressional redis- 


іза, in transportation management and 
planning. and in other federal government 
activities. 

TIGER Line files are at the heart of tbe 
system. They define line, landmark, and 
polygon features in a topologically inte- 
‘rated fashion. Lines most often represent 
roads, hydrography, and political boundar- 
ies, although railroads, power lines, and 
pipelines are also represented. Polygon fea- 
"ores include census tabulation areas such as 
census block groups and tracts. and area 
landmarks such as parks and cemeteries 
Point landmarks such as schools and 
churches may also be represented. Points, 
lines, and polygons are used to define these 
features (Figure 7-24). 

Nodes and vertices are used 1o identify 
line segments. Topological attributes are 
attached to the nodes and lines, such as the 
polygons on either side of the line segment, 
orthe line segments that connect to the node. 
Point landmarks and polygon interior pois 
are other topological elements of TIGI 
Line files. 


Pepoaton change by census tract, 1990-1000 


23: Digital census data provide spatially 
ecciesie dirt for e US. 


gue 224 TIGER tt prie 
сы الہ س‎ 
(U.S. Dept. of Commerce). P 


TIGER/L ine files contain information to 
identify set address labels Starting and 
ending address numbers ae corre- 
sponding to starting and ending nodes (Fig- 
ure 7-25). Addresses may then be asi 
within the address range, The system 

not allow specific addresses to be assigned 
to specifi buildings. However, it does 
restrict the addresses on a city block to a lim- 
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ited range of numbers, something of great 
use to field workers responsible for collect- 
ing census information. 

‘TIGER Line files are organized by dif- 
ferent record types. A collection of records 
reports the location and attribute information 
about а set of census features, including the 
location, shape, addresses and other census 
attributes for a county. There is an identifier 
based on the United States Federal Informa- 
tion Processing Standards code (FIPS) that is 
‘used to identify the file and record type. 


Census data are distributed as ESRI 
 Geodatabases and as shapefiles for various. 
geographic units, In addition, specialized 
Software. are available to ingest 
TIGER/Line and related census and other 
federal government data fles to data layers 
iic GIS formats. Data are collected 
on activities, health, crime, among 
other topics. These data may be extracted in 
а customized manner. 

Many United States government data 
ме provided with codes compatible with 
United States Census Bureau data. For 
‘example, the U.S. Department of Transpor- 
tation, and the U.S. Center for Disease Con- 
"rol deliver with codes needed to link 


the ше. ting appro 


TIGER data rove des ngs rn (nin журин Thee ngs my be die 


location оа a шен (US. Dept of Commerce). 
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statistics to geography, in Figure 7-26 an 
‘example mapping the average traffic fatality 
rate from 1997 through 2006. These data can 
be conveniently combined because data are 
tagged by county FIPS codes, census tracts, 
block groups, ог other census geographies. 

"While many data are available, they 
often require processing after download to 
render them useful. This may include some 
form of editing to winnow table variables or 
remove unwanted features, or raster resam- 
pling, or coordinate projection, ог datum 
Transformation. or conversion among data. 
models, all activities covered in previous. 
chapters. 

Processing may also entail type conver- 
sion, e, many point data are distributed as 
coordinates in a table, and these must be 
converted to point features in a data layer. 
Most GIS softwares provide utilities to con- 
vert X and Y coordinate data to shapefiles, 
geodatabase layers, or other common spatial 
data formats. 


Processing may also require conversion 
among data models, e.g.. from vector lo ras- 
тег. Most GIS softwares provide conversion. 
tools among data models, although on con- 
Version we must bear in mind the conceptual 
differences among models. Remember that 
Vector data don't store information on areas 
outside of features. for example, the areas 
outside of polygons are undefined. When. 
converting polygon data to raster data, the 
raster cells in these “in between” locations 
may be assigned a value that indicates “null” 
or “nodata”, or otherwise flagged as 
‘unknowns (Figure 7-27). These nodata val- 
ues are sometimes set to a specific numeric 
value, often zero or an implausible negative 
‘number such as -9999, or some other value 
that is not present in the feature data valves. 
‘This may affect further processing, and these. 
‘unknown areas often must be modified 
before subsequent analysis. Some spatial 
functions are nonsensical on these codes, 
eg. the natural log functions when applied 
to negative numbers. Other softwares simply 
‘won't apply most functions to nodata raster 


Vector Lakes 
undefned 
regions 


Figure 7-27: Data available for download may be 
ish ме in raster processing. А vector model 


cells, returning an output value of nodata 
‘when a cell is encountered in any function or 
processing. Typically there are functions 
‘hich explicitly evaluate cells for nodata 
values, and allow re-assignment. These and 
other processing tools for vector and raster 
data are described from Chapter 9 onwards 


Summary 

Digital data are available from a number 
of sources, and provide a means for rapidly 
and inexpensively populating a GIS data- 
base. Most of these data have been produced 
by goverment organizations and are avail 
able at litle or no cost, often via the ете. 
Data for elevation, transportation, water 
resources, soils, population. land cover, and 
imagery are availabe, and should be evalu- 
ated when creating and using a GIS. 

Data available can seem to be an alpha- 
bet soup of acronyms, Below isa list orga- 
nized by topic 
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Roster Conversion 
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Global Data: 
- OSM (Open Stee Map, vector features) 
ual Earth (mito small scale) 

+ NASAEOSDIS, ESA DIAS (satelite data) 
- GMRT (ocean topography) 

~ SRTM (global elevation) 

+ Open topography (elevation) 

-TRMM (global precipitation) 

= Landsat (global land cover, vegetation) 

+ SEDAC, Тата Populus (рор, socio. data) 
-UNEP (broad range of environmental data) 
-Landscan (global population estimates) 


U.S. North American Data: 
tional Map (various U.S. data) 
-3DEP. open topography (DEMs) 

NHD. NHDplus, CUAHSI (hydrology) 
NAIP. NAPP, USGS HRO (hi-res images) 
MRLC, NASS CDL, C-CAP (landeover) 
NWI (wetlands) 

- SSURGO, STATSGO (soils) 
-NFHL FEMA (floodplain, hazard data) 


-TIGER Line (digital census boundaries! 
tables) 


322 GIS Fundamentals 


‘Suggested Reading 


Broome, FR., Meixler, DB. (1990). The TIGER database structure. Cartography and Geo- 
graphic Information Systems, 17:39-47. 


Carter, JR. (1988). Digital representations of topographic surfaces. Photogrammetric Engi- 
‘neering and Remote Sensing. 54:1577- 1580. 


Decker, D. (2001). GIS Data Sources. New York: Wiley. 
Di Luzio, M., Arnold, JG. Srinivasan, R. (2004) Integration of SSURGO maps and soil 


‘parameters within a ис information system and nonpoint source pollution model 
System. Journal of Soul and Water Conservation, $9:123-138. 


Gesch, D., Oimoen, M.. Greenlee, 5. Nelson, C., Steuck. M., Tyler, D. (2002). The national 
‘elevation dataset. Photogrammetric Engineering and Remote Sensing, 68:5-11. 


Goodchild, M.F, Anselin. L., Deichmann, U. (1993) A framework for the areal interpolation 
of socioeconomic data. Environment and Planning, 25:383-397, 


Gorokhovich, Y., Voustianiouk, A. (2006). Accuracy assessment of the processed SRTM- 
based elevation data by CGIAR using field data ftom USA and Thailand and йз relation to 
the terrain characteristics. Remote Sensing of Environment, 104:409-415. 


Hany. Leung Y. (Ede) COLS) Advances Spatial Data Handing and Analysis. Berlin: 
Springer. 


Homer, С. Huang, С. Yang. L., Wylie, B. Coan, M. (2008). ofa 001 national 
nir aatas forthe سوا‎ ed Rame 
Sensing, 939-840 


o bea Sahn EN 
Baga aay yy eer 
Research Issues. Fort Collins: GIS World. „= 


Marx, RW (1936). The TIGER system: automating the geographic structure ofthe United 
States Census. Government Publications Review, 13:181-201. 


Maune, DF, 2007) Digital Elevation Model Te and Applications: The DEM User's 
Manual (2nd ed.). Bethesda: American Society of Photogrammetry and Remote Sensing. 


Openshaw: S. Taylor Р (1979) A million or so сапып coefficients: Three experiments on 
{the modii areal anit problem. N. Wrigle (Ed) Sonstcal Applications m the Spana! 


‘Smith, В. Sandwell, D. (2003). Accuracy and resolution of shuttle radar topography mission 
data. Geophysical Research Letters, 30:1-20. 


Chapter 7: Digital Data 323 


Taylor, P.L, Johnston, RJ. (1979). Geography of Elections. Hammondsworth: Penguin, 


Wilen, B.O., Bates, KM. (1995). The US Fish and Wildlife Service's National Wetlands 
Inventory Project. Vegetaio, 118:153-169. 


324 GIS Fundamentals 


Study Questions 
7.1 -What are some advantages and disadvantages of using digital spatial data? 


7.2- What are the most important questions you must ask before using already devel- 
oped spatial data? 


7.3 - For each of the following data sets, tell us who produces them, what are the 
source materials, what do the data sets contain, their grain sizes and accuracies, and 
how they are delivered: NAIP digital photographs, NHDplus, digital elevation mod- 
els (DEMS), digital floodplain data, National Wetlands Inventory data (NWT), TIGER 
census data, and national land cover data (NLCD) зев. 


7.4- What is tile edge matching and why is it important? 


7.5 - Identify and describe the characteristics of three different sources of digital ele- 
vation data, What are the pros and cons of each source? 


7.6 - Visit one of the websites mentioned in this chapter, or in the appendices at the 
end of this book, and download several data layers of an area of interest. If you have 
access to a GIS, try to import these data and display them. 


8 Attribute Data and Tables 


Introduction 


We have described how spatial data ina 
GIS are often split into two components, the 
‘coordinate information for object geometry, 
and the atribute information for the nonspa- 
tial properties of objects. Because these noo- 
spatial data are frequently presented to the 
user in tables, they are often referred to as 
tabular data. Tabular data summarize the 
most important characteristics of each 
“object, for example, attributes about counties 
(Figure 8-1). In this example, the attributes 
include the county name, Federal Informa- 
tion Processing Standards (FIPS) code, pop- 
ulation, and area. 


Attribute information in a GIS is typi- 
cally entered. analyzed. and reported using a 
database management system (DBMS), а 
specialized computer program for organizing 
‘and manipulating data. The DBMS stores the 
properties of geographic objects and the rela- 
tionships among the objects. A DBMS incor- 
рогмез software tools for managing tabular. 
data, including those for efficient data stor- 
‘age, retrieval, indexing, and reporting. 
DBMSs were initially developed in the 
1960s, and refinements since then have led to 
robust, sophisticated systems employed by 
‘government, businesses, and other organiza- 


0 County Name ЕРЕ Popo Ares 
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Figure 8-1: Data in a GIS incl both spatial (lefi) and attribute (right) components- 
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tions. A somewhat standard set of DBMS 
tools and methods have been developed and 
are provided by many vendors. 


Students often struggle with relational 
databases at first, and offen ask, “Why 
bother? Can't we just use a spreadsheet?” 
Many more people are familiar with spread- 
sheet forms, programs, and manipulation, 
and don't understand why we bother to 
adopt DBMS. A short example may help 
explain their value, 


Consider the file shown in Figure 8-2, 
representing business orders. Each row 
records the purchaser, an order number, and 
"he items ordered. Spreadsheets typically 
present data like this in a single, "lat file.” 
Because orders may contain multiple items, 
‘we need multiple columns with copies of the. 
itenvquantity pair. For example, order num- 
ber five by Atom Ant includes two items, 
{wo BS2s and two CRTs, while order num- 
ber three by Paul Smith has four items, or a 
total of $ columns for items. Larger orders 
‘would require additional columns. 


This flat file structure is sub-optimal, 
‘We either have to limit the number of items 
per order (rarely a good thing for a busi- 
ness), of else not know how many columns. 
‘our database might have, which would com- 
plicate programing and management. More 
importantly, we can easily have mostly 


blank entries in our database. We may have 
thousands of orders with one ог two items. 
However, if we have one order with 50 dif- 
ferent items, we have to add enough col- 
umns to accommodate 50 item quantity 
pairs, even for orders with one item. As with 
orders 1, 2, 4. 6, 7, and $ in the table below, 
many of the cells will be empty. More data 
‘adds о storage costs and causes longer pro- 
cessing times. This flat file structure is prone 
tobe both slow and space hungry. 


There is another disadvantage. Note the 
two orders from Paul Smith. This requires 
copies of his name and contact information. 
‘This wastes space and makes editing more 
cumbersome and error prone. If Paul Smith 
changes his phone number, we must search 
through every line in our database and 
change every instance of an order that con- 
tains Paul Smith's phone number. 


may be written. 
Miren tem 


of space, slow 
processing. and difficult editing in flat file 
formats. This requires specific knowledge of 
the file structure, for example the arrange- 
ment and number of columns. While these 
‘workarounds are possible, they are often 
complicated and require substantial program. 
‘maintenance. Database management sys- 
tems were developed to overcome these 
redundancies and inefficiencies, among oth- 
ers. While spreadsheets or other flat file for- 
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Figure 8-2: An example of data in a flat fle format 


mats may be used for simple collections of 
data, DBMSs are better for most applica- 
tions that process large amounts of data. 


DBMSs provide other advantages. They 
may provide data independence, а valuable 
characteristic when working with large data 
sets, Data independence allows us to make 
changes in the database structure in ways 
that are transparent fo any use or program. 
This means restructuring the database does 
not require a user or programmer to recode 
or modify their procedures. 


DBMSs may also provide for multiple 
user views, with different data provided in a 
different form to each program or user. А 
DBMS may allow centralized control and 
‘maintenance of important data. One standard 
copy of the data may be stored and updated 
оп a regular, known basis, time stamped ог 
versioned number to aid in management. A 
manager is charged with maintaining data 
currency, quality. and completeness, and 
with resolving contradictions among various 
versions of the database 


Adopting a DBMS may come at some 
cost. Specialized training is required, 
although standard DBMS programming lan- 
ages ке wet across maay зумее, 

Defining the components of a database and 
relationships among them is often a complex 
task. Structuring the database for efficient 
access or creating customized forms will 
often require significant effort. The software 
itself may be expensive, although free, sta- 
ble, open source database management soft- 
wares are available. Users may need to 
optimize speed for certain operations that are 
too slow in a DBMS. However, for many 
users, the value of the DBMS far outweighs 
these costs. 
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Database Components and Char- 
acteristics 


‘The basic components of a traditional 
database are data items or attributes, the 
indivisible units of data like name, popula- 
tion, or area (Figure 8-3). These items can be 
any characteristic used to describe things. 
Attributes may be simple, for example, one 
word or number, or they may be compound, 
for example, an address data item that con- 
sists of a house number, a street name, а city, 
and a zip code. 


Бур" 63: Сарони of table dee: 


пеп have a уре and а domain that 
restrict the values they may take. Types 
define essential characteristics of an item. 
Common types include real numbers, integer. 
numbers, both of various lengths, bexadeci- 
mal numbers, text fields, hyperlinks, and 
binary large objects (blobs). Domains define. 
the acceptable values an item may take, for 
‘example, integers may be restricted to be 
larger than 0 but smaller than 10, or there 
may be a type name “color” that can only 
take on the values “red”, “green”, or "blue." 


A collection of related data items that 
are treated as a unit represents an entity. In a 
GIS, the database entities are typically roads 
‘counties, lakes, or other types of geographic 
features. A specific entity, such as a specific. 
‘county. is an instance of that entity. Entities 
are defined by a set of attributes and assoc 
ated geographic data. In our example in 
Figure 8-3, the attributes that describe a 
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county include the name, a FIPS code, the 
1990 population in thousands of persons, 
area, and the population density. These 
related data items are often organized as a 
Tow ot line ina table, called a record. Spe- 
cific database systems often define the terms 
differently for each of these parts. For exam- 
ple, in the relational database model, the 
record may be called a row or an n-uple, and 
"he tables referred to asa relational table, or 
sometimes as just a relation. 


You should note that the concept of an 
entity, when referred to in a database, may 
be slightly different than an entity in a GIS 
диз model This ёлсе stems fim two 
different geographers and с 
Scientists using a word or diferent b 
related concepts. An entity іп a geographic 
data model is often used for the real-world 
thing we are trying to represent with a carto- 
graphic object. In contrast, computer scien- 
tists and database managers often define an 
entity as the principal data object about 
which information will be collected. In the 
DBMS literature, the entity із the data object 
"hat denotes physical things, and not the 
‘thing itself. This is а subtle distinction in 
terminology, but these different definitions 
сап lead to confusion unless the difference in 
‘meanings is noted. For the remainder of this 
chapter, we will ase the definition of an 
entity as a data object. 


A DBMS typically supports complex. 
structures, primarily to provide data security. 
to maintain stability. and to allow multiple. 
users access. Database users often demand 
shared access, when multiple users can 
access a data set simultaneously. If each pro- 
gram or user has direct file access, multiple 
Copies of a database may be open for modifi- 
cation at the same time (Figure 8-4, top). 
Multiple users may ry to write to the data 
file simultaneously. with unforeseen results. 
The data saved may be the most recent, the 
first updates, or some mix in between. 
 DBMSS are usually designed to manage 
multiuser access (Figure 8-4, bottom) and 
prevent these errors. 


1991) 


The separation of data and functions 
into multiple levels is often referred to as a 
multi-tiered architecture (Figure 8-5). Data 
are primarily stored atthe lowest tier. These. 
data may be of diverse types, including coor- 
dinate data, attributes, text, images, sound, 
ideo recordings, or other important, per- 

sistent data, 


Data sets at the lowest tier may be man- 
aged by an individual database system (Fig- 
ure 8-5). The system or programs that access 
the first tier, at the bottom of a multi-tiered 
system. is often called a transaction man- 
‘ager. This transaction manager typically 
takes requests from higher tiers and searches 
the relevant portions of the database to iden- 
tify the requested data, or perform the 
requested operation. 

The middle tier is often referred to as an 
applications server Figure 8-5). This tier 
passes requests from higher-level tiers into a 
set of instructions the database(s) below can 
"understand." For example, a real estate 
agent may want to identify ll houses in a 
certain price range, in good school districts, 


and near rapid transit stations for a prospec- 
tive buyer. The applications server may gen- 
rte ree different requests — ono 

houses in a price range, a second to 

good school districts, and a third to 
find pont nos ‘The applications 
server may then perform the operations to 
determine where these important criteria are 
met. 


The uppermost tier of multi-tiered archi- 
tectures is typically a user interface (Figure. 
8-5), This tier may be a display by a single- 
purpose or topic-specific program such as a 
GIS, or a Web-based interface with the pri- 
mary purpose of gathering requests from the 
user, and presenting information back to the 
user based on those requests. 


Multi-tered architectures are adopted 
primarily to insulate the user interface from. 
the processing and data at lower tiers, and to 
allow access to a more diverse range of data 
through the lowest tiers. The parts are easier 
to change when they are isolated, and new, 
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different resources may be more easily inte- 
‘grated. Ifa company decides to redesign 
their data entry interface, they may do so 
easily if the user interface is distinc from the 
tiers below. They do not have to worry about 
how the applications server or transaction 
manager access the databases lower down. 
The integration of a new database technol- 
‘ogy is often easier with multi-tiered architec- 
tures. 


Multiuser access adds substantial com- 
plexity to processing. For example, the 
Server must ensure that when several copies 
ofa database are accessed, changes to the 
database must be reconciled on resubmis- 
sion. Ifthe updates from two clients conflict, 
such as when one client deletes a record 
while a second client modifies a value for 
the same record, the program must resolve 
the differences: perhaps one user has higher 
priority, the most recent changes are 
‘enforced, ora message is sent to an operator 
noting the ambiguity. 
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Relational databases have grown to 
become the most common database design 
since their introduction in 1968. Relational 
‘models are more flexible than most other 
designs. The tables structure does not restrict 
processing or queries, and is not too difficult 
to understand and implement relative to 
other database designs. It can accommodate 
ıa wide range of datatypes, and it is not nec- 
essary to know in advance the kind of que- 
Ties, sorting, and searching that will be 
performed on the database. 


‘There are typically a cluster of tables, or 
relations, in a relational database design. as 
shown for forest and related recreation data. 
in Figure 8-6. Entities are represented by 
rows in a table, In ош forest data 

there may bea forest table with a row i 
each forest, and other tables representing the 
trails, trail features, and recreational oppor- 


munities. As noted earlier, the rows are also 
called records or ples. 


Tables are related through keys, one or 
more columns that meet certain require- 
ments and may be used to index ће rows 
Keys are often a column that uniquely i 
lites every row in a table We often assign a 
unique number or code to be a key, for 

example, a Social Security number a key 
fora set of people in the United States. No 
two people have the same valid Social Secu- 
rity number, so we can use the number to 
connect a row of information to a specific 
person. 


Keys are used to join data from one 
table to associated data in another table (Fig- 
ure 8-7). Keys are the “key” to the utility and 
flexibility of relational databases. They 
allow us to mix and match data from various 
tables: to display data differently for differ- 
ent projects or audiences: to organize our 
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Tobie from Relational Join 


eT Fre ads data ixa sional data 


Rows hold records associ 


p 
with an entry. and colum old tems. A key here Forest-TD, i wed to join tables 


data in ways that help us more quickly 
search, select, and update our data; and to 
isolate our data from calling programs ог 
changes in computer hardware. 


Figure 8-7 shows a join of our forest and 
trails data in a relational data structure, 
through the Key Forest-ID in the Forests and 
‘Trails tables. This shows how keys allow us 
to break our data up into several tables and 
reap all the benefits described above, while 
providing a mechanism to link between 
tables. Note that this linkage of separate 
tables is usually transparent to the end user, 
їп that all or subsets of the forest and trails 
data in Figure 8-7 may be displayed on 
Screen or printed as one continuous tabe 
Data from three or more distinct tables in a 
DBMS are often joined, and columns sub- 
sets displayed in what appears to be one 
table to a user. 


A database is often represented in a 
schema. A schema is a compact graphical 
representation of the conceptual model, the 
entities, tables, keys, and the relationships 
among them. It is often presented in a stan- 
dard shorthand notations, usually via enit 


relationship diagrams, also known as E-R 
diagrams. We will not describe E-R dia- 
grams here, as they are more appropriate for 
more advanced courses. 


Given their importance, there are some 
restrictions on keys. For example. null val- 
‘ues are not allowed to be part of a key. There. 
may be many potential keys (columns that 

pany Mes enc soo) ba cali 

‘one is chosen for use, called a primary key. 
Most tables in the database will have a pri- 
mary key, and keys in a table are frequently 
used to combine tables. Some keys are used 
to index and add flexibility in selecting data, 
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Primary Operators 


The relational algebra defined by EF. 
Codd supports eight primary operators: 
restrict, project, union, intersection. differ- 
ence, product (all combinations ofa given 
Set of variables recorded in a database). join 
(combine tables based on matching attribute 
Values), and divide (facilitate queries based 
‘on condition). Here we will focus on the 
most commonly applied database operations 
in GIS (Figure 8-8, restrict, project, union, 
and join. 


Restrict and project operations select 
based on rows and columns, respectively. to 
provide reduced tables. Restrict, embodied 
in a fable query, serves up records based on 
values for given variables. The restrict in 
Figure 8-80 is specified to restrict the current 
set to those that have a size that is big or 
‘huge = ай other entries in the relation are not 
selected (remember, tables are called "rela- 
tions” in a relational database). The restrict 
then only returns four of the seven records, 
as shown in Figure 8-80, 


Restrict operations can be compound 
‘and complex and involve more than one 
attribute. Restrict operations most often 
retum а reduced set of rows for a table as 
‘output, Examples of more complex restrict 
‘operations, or table queries, will be shown 
later in this chapter. 


Project operations retum entire columns 
fora table, in effect subseting the table ver- 
tically аз shown in Figure 8-8. Database 
tables may be quite large, and contain hun- 
dreds of items. A given analysis may con- 
cem only a few of those items, and so the 
project operation allows only those columns 
of interest from the table о be subset. This 


may substantially increase processing speed, 
reduce the storage space required, and ease 

viewing and analysis. Inthe example shown, 
10, color, and size are selected from a base 

relation to create a new relation. 


А шоп operation combines tables to 
retum records found in either or both 
tables. As shown in Figure 8-80, the tables 
are "stacked" to retum а new table with 
members of both, but it does not show 
duplicate records for those entries that 
appear in both tables. As such, the result of 
a union is at least the size of the largest of 
the two tables, and no larger than the sum 
of the two tables. 


The tables used in a union must be ofthe 
same kind. That means they must have the 

same set of variables or items. It makes litle 
sense to find the union of two tables when 

they do not share the same set of items, 


A join operation combines two tables 
through keys. Valves in one or more keys 
‘matched across tables, and the information is 
combined based on the matching. Figure 8- 
за shows an example of a join across two 
tables, in this case joined through the ype 
item. Each type entry in the table on the left 
is matched to the туре value in the center 
table, and the data are then joined or related 
through the values of type. The output 
records to the right of Figure 884 are the 
combined attributes of both tables. Records 
in the output table with ТО values equal 1o 1 
and 4 have type values equal to o as well as 
the color, size, and оде associated with type 
ıa. Those records with type b have the appro- 
priate IDs (2, 3) and the color, size, and oge 
associated with pe b. 
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Hybrid Database Designs in GIS 


Data in a GIS are often stored using 
hybrid designs. Hybrid designs store coordi- 
nate data using specialized database struc- 
tures, and attribute data in a relational 
database. Thousands to millions of coordi- 
nate pairs are often required to represent the 
location and shape of objects in а GIS. Even 
with modem computers, the retrieval of 
coordinate data stored in a relational data- 
‘base design is often too slow. Therefore, the 
coordinate data are frequently stored using 
structures designed for rapid retrieval. This 
involves grouping coordinates for carto- 
graphic objects, for example, storing ordered. 
руке шш M 
indexing ог lines to identify poly- 
ons Pointers are used o lik elated aes 
or polygons, and unique identifiers link the 
‘geographic features (points, lines. or poly- 
ps) toconespondag ame dam (Figure 


Coordinate Data 


Arc-ID coordinates 


Topological relationships may be explic- 
itly encoded to improve analyses or to 
increase access speed. Addresses to the pre- 
vious and next data are explicitly stored in 
an indexing table, and pointers are used to 
connect coordinate strings. Explicitly 
recording the topological elements of all 
geographic objects in a data layer may 
improve geographic manipulations, includ- 
ing determination of adjacencies, line inter- 
section, polygon overlay and network 
definition. Coordinates for a given feature or 
part ofa feature may be grouped and these 
groups indexed to speed manipulation or dis- 
play. 


Hybrid data desi ally store anri- 
bute data in a DBMS These data are linked 
tothe geographic data through unique identi- 
fiers or labels that are ап attribute in the 
DBMS. Data may be stored in a manner that 
facilitates the use of more than one brand of 
DBMS, and allows easy transport of data 
бош one DBMS to another. 
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Selection Based on Attributes 


The Restrict Operator: Table Que- 
ries 


Queries are among the most common 
‘operations in a DBMS. A query may be 
viewed as the selection of (or restriction t0) a 
subset of records based on the values of 
specified atributes. Queries may be simple, 
using one variable, or they may be com- 
pound. using one or more conditions on 
more than one variable. One might search 
for all the parcels with unpaid taxes all cen- 
sus blocks larger than a square mile and with 
at least 200 inhabitants, ог all fire hydrants 
that haven't been pressure tested, are near 
high-rise buildings, and are farther than 300 
m from the nearest other fire hydrant. In 


‘concept, queries are quite simple, but basic 
‘query operations may be combined to pro- 
duce quite complex selections. 


Many GIS softwares provide a query 
builder. a graphical user interface (GUT) that 
helps in applying selection operations (Fig- 
‘ure 8-10). Most GUIS include a list of avail- 
able fields, operations, and a sample or 
complete display of values for selected 
fields. The user constructs queries by aller 
nately selecting item names, operations, and 
‘entering values. This query may then be 
applied, and features selected. Often you 
may save complicated or long expressions, 
to be reused later on different data sets. 


GD GOD Cour) Gon 


GD 


GUT ofthe sort often provided in GIS software, bere from QUIS. 
(а the botom panel by cheking on fields operatore. 
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The left side of Figure 8-11 demon- 
strates а simple query. А single condition is 
specified, Area > 20. Each record in the 

selected setis shown in gray in Figure 8-11. 


‘The right side of Figure 8-11 demon- 
strates a compound query based on nwo attri- 
‘utes. This query uses the AND condition to 
select records that meet two criteria. Records. 
are selected that have a Landuse value equal 
to Urban, and a Municip value equal to Cty. 
All records that meet both of these require- 
‘ments are placed in the selected set, and 
records that fail to comply with these 
requirements are in the unselected set. The 
‘Boolean operations AND, Ой, and NOT may 
be applied in combination to select records 
"hat meet multiple criteria. 


ANO combinations typically reduce the 
size of the selected set when compared to the 
individual component criteria. They provide 
а more strenuous set of conditions that must 
be met for selection. In the example on the 
right side of Figure 8-11, the record with 1D 
* 7 meets the first criterion, Londuse = 
Urban, but it does not meet the second crite- 
rion specified ш the AND, Municipality « City. 
Thus, the record with ID » 7 is not selected 
ANDS add restrictions that winnow the 
selected set. 


OR combinations typically increase or 
add о a selected set in compound queris. 
Ой conditions may be considered as inclu- 
sive criteria. The OR adds records that meet a 
criterion to а set of records defined by previ- 
ous criteria. In the query on the lef side of 
Figure 8-12, the first criterion, Area > 20, 
results in the selection of records 2, 4, 5, and 
6. The OR condition adds any records that 
satisfy the criterion Municip = City. in this 
case the record with ID » 1 


The NOT is the negation operation, and 
may be interpreted as meaning “select those 
records that do not meet the condition” The 
right side of Figure 8-12 demonstrates the 

ion operation. The operation may be 
viewed as fi завие equals forte 
NOT, and identifying all records. Then the 
remaining records are placed in the selected 
set, and the identified records placed in the 
unselected set. 


ANDS, ORS, and NOTs can have complex 
effects when used in compound conditions, 

and the order or precedence is important in 

the query. Combinations of these three oper- 
ations may be used to perform very complex 
selections. 


Figure 8-13 shows the results ofa com- 
plex query. combining AND, OR, and NOT 
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Figure 8-11: Simple selection. | опе criterion to select records (left), and compound selection, 
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OR selection: NOT selection: 

records with (Area > 20.0) records with 

OR (Municip = City) Landuse NOT Urban 
о [лее Голае | тас 
П 105 | umo | су 
2 | 3303 | Fam | County 
3 24 [Serban [тооло 
4 эво [ожив | Соту 
5 z1 | urban | су 
n 3o2 | Fam [wee 
7 44 | Urban | County 


Figure 8.12: OR and NOT compound selections 


operations. Here, the square brackets choose 
rows with a Londuse value equal to Urban, 
‘and Mill Rate values equal to В. Row 5 is the 
‘only record satisfying these criteria. The 
curly brackets select those rows that are not 
ina City. and with a Density greater than 200, 
‘This selects Row 3, and the final selected set 
includes both rows, by the OR operation. 
Selection operations may get quite compli- 
cated, and long. selection sen- 
ences" may be saved, that is, the syntax 


Complex selection: 


‘copied in a text file or other repository, and 
applied when needed. 


While database queries are typically 
applied to tables, we must remember that in 
а GIS. the tables are usually connected in 
some way to geographic features, Selections 
‘of able elements imply the selection of asso- 
ciated elements. It is always a 
good idea fo verify that the selection works 
аз expected when first using a software. Ver- 
ification is often easiest by viewing the 


records with [(Landuse = Urban) AND (Mill Rate = В)] OR 
{NOT(Municip = City) AND (Density > 200)) 


‘Area 
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Figure 8.13: An example of a complex selection, combining various selection opera- 
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selection results, either on the table, the 
‘geography, or both. Figure 8-14 illustrates 
the results from three separate selection cri- 
teria: a) that county population be greater 
"han 50 persons per square mile, b) that the 
‘median age be less than 40 years, and c) that 
housing vacancy rates be greater than 10%. 
The rightmost panel shows counties reamed 
from a query specifying that criteria а and b 
and c all be met. The accuracy of the query 
may be quickly verified by inspecting maps 
ofthe component and final selections. 


Figure 8-15 demonstrates that queries 
are not generally distributive. For example, 
if OP? and OP2 are operations, such as AND 
‘or NOT, then, 


OPI (CondtionA OP2 Condition) en 
is not always the same as 
(OPiCondfonAJOPa(OPi CondwerB) (82) 


For example, 


NOT [(Lendse « Urton) AND 
(Municipality = County) 


(2 


S) Population T 
(eater thon | 


does not yield the same set of records as the 
expression 


INOT (олде + Urvan) AND. 
{нот (nunicipaty = County] вд 


Кешек 
el"), SQL was initially developed by the 
International Business Machines Corpora- 
tion, but is supported by a number of sofi- 
ware vendors. SQL is a nonprocedural query 
language in thatthe specification of queries. 
does not depend on the structure of the data. 
The language can be powerful, general and 
transferable across systems, and so has 
become widely adopted, 


SQL allows us to both define and 

manipulate data. Data types and tables con- 

variables of a given type may be 
Standard operations are used to 

ёре dats, for angle, to select, 
delete, insert, and update records in a data- 
base. Long or complicated queries may be 
saved intext files, or as scripts, that may be 

debugged. modified, or used later, These 
scripts may be quite long, and may some- 
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NOT [ ( Landuse = Urban) AND [NOT ( Londuse = Urbon)] AND. 
(Municip = County) ] INOT (Municip = County)] 

wo | Ares | tonsure [maie 
1 юз | ute | су 
2 [3303 | кыт [Солу 
3 24 [Suburban | Township 
D 960  [Suurton | County 
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E n 
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BAS Selection operations пау not be динле and tbe order of appt in very import. 
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iven their complexity and capabilities. SQL ever SQL is зо various SQL 
Shae REA E. 
write, test, and automate these scripts. been developed and are widely used. 


Because SQL wasn't initially designed 
for spatial data processing, many spatial 
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Joining Tables 


Relational databases are so powerful in 
part because we can structure out data to 
reduce duplication, ease maintenance, and 
give flexibility; much of this flexibility is 
because we can join tables. A join, also 
known asa relaie, uses columns in one table. 
to match rows across tables. Joins were illus- 
‘rated in Figure 8-7 and Figure 8-8, рап d, 
but additional examples are warranted. 


Joins are based on join items. In their 
implest form, a single column in one table. 
matched to a column in another table, and 
new table displayed by combining rows for 
‘matched values. Figure 8-16 shows a simple 
join between two tables. Here, Code A and 
Code. B, respectively, аге used. If we call 
Table A the “target table,” and Tobie В the 
"source table,” then ош join consists of 
‘matching the values for a row in the source 


table to the target table, and “copying” the 
values from the matched rows to the output 
table. The values aren't truly copied, but 
rather associated and displayed together, o 
save space and speed operation. 


Primary Keys and Joins 


We've noted earlier that keys are crucial 
to relational databases, and as such they 
have certain special characteristics. There 
are several kinds of keys, most important the 
primary key, a chosen item or items that 
‘uniquely identify each row in a table, We 
often create and generate unique numbers as 
keys. for example, a social security number. 
is unique toa person; a parts ID number are 
often unique to an object; SKU numbers are 
unique to items in a store. 1D. A can serve as 


a primary Key for Table A in Figure 8-16 
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Figure 816 Thin Greene a sinple join, Tobie В is joined to Tolle A matching he Сое Br 
omesponding Code A 


Values to create Ou Tobie. 


because the values uniquely identify the 
rows. Table B in Figure 8-16 has three col- 
umns that could serve as primary keys, 

Type. B, Code. B, or Sae. В. These are ай 
candidate keys, because each could serve as 
a primary key. One of the candidate keys is 
anointed the primary key. 

Figure 8-17 illustrates the join concept. 
A join matches rows by key 
values. In Step 1, Code. B, X values in Table 
B are matched to the Code. А X values in 
Table A. The first row in Out Tobe is acom- 
posite of row 1 from Table A (1D_A-t, 

Туре A«Fr, Bolch.A«1001) and row 1 from 
Table B (Type-B-Ag. Ste. B1). А similar 

matching is performed for each X in Tobie B, 
and row written їп the Out Table. 
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2 in Figure 8-17 shows a similar 
the join variable with a value of Y. 

rows are matched from Toble 
Bto Table A, and written to Out Table. There 
is only one matching row in this case. Step 3 
shows a match of the join variable Z value. 
Again, there is a cross-table matching, creat- 
ing new rows in the Out Table. Since there 
are no q values in Tobie A, in this particular 
‘version ofa join, there are no rows written in 
the output table, 


Note that this example is for illustration, 
but database systems usually don’t follow a 
sequential process as shown, nor write a new 
table. Sorts and indexing are often used to 
speed matches, and matched data are shown 
in what appears as a single table, but simply 


match 
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displayed from their original tables: no new 
table is written unless expressly requested. 


Also note that non-matching elements 
are discarded in this example. We may spec- 
ify a join that saves some or all of the non- 
‘matching elements, and we should be aware 
of these different join variants. 


The primary key, or an item that could 
serve as a primary key, is usually used as a 
join item in the "source" table ofa join oper- 
ation. This is illustrated in the join in Figure 
8-16 and Figure 8-17, Code. B in the source. 
Tobie B uniquely identifies each row in that 
table, and is used to link to Tobie A via 
Code. ^. As described later, when items that 
are not candidate keys for the source table 
are used in joins, you often get ambiguous or 
erroneous joins. 


Figure 8-16 and Figure 8-17 illustrate. 
the most common type of join, known as an 
inner jon, where unmatched rows are dis- 
carded. Ап altemative outer join saves the 
information for non-matching rows, placing. 
blank or null values for missing items (Fig- 
‘ure 8-18). There are both left- and right- 
‘outer joins, depending on whether the target. 
or source honmatches аге in the target or 
source tables There is also a natural join, in 
‘hich equally named columns aren't copied, 


nN 
S by 


‘or cross joins, in which all rows in one table 
are combined with all rows of another table, 
forexample, across join of Tables A and Bin 
Figure 8-16 would result in a table with 24 
rows (6 rows for A times 4 rows for В). 


Mastering the differences between these. 
types of joins is perhaps a bit advanced for 
an introductory GIS course, but software 
may set any of these different types of joins 
эз а default method, and they may not be 
explicitly identified by name. These and 
other different types of joins are covered in 
depth in most introductory database books, 
but can be confusing to distinguish and 
apply without some practice. We introduce 
them bere to: 

+ wam you of the differences between 
different types of joins, and to empha- 
size that different types of joins will 
usually produce different results, even 
‘when applied to the same data, and 

^ Io stress that there are standardized 
names for different types of joins, 
although not all GIS softwares use 
them. You should verify how joins 
work when first using new software. 


Figure 8-18 illustrates a difference 
between inner and outer joins. The center 
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table îs joined to the leftmost table оп the 
item Type. A resulting inner join is shown in 
о. Note that only rows 2, 3, and 6 from the. 
“target” table on the left, with values Ag and 
бт for Type (our key), are recorded inthe 
output able in Figure 8-186, because those 
are the oly Type values found in both 
tables. Information in rows 1,4, and 5 is not 
retained in the output able. 


Figure 8-18 shows an outer join, in 
which unmatched source table rows are 
retained. Null or empty values are placed for 
the non-matching attributes of the source 
Table, as shown by the dashes in the Code 
item for rows 1,4, and 5 of output table b. 


‘You may have deduced by now that the 
join items are crucial when joining tables. If 
the join items are not correctly created, then 
the joins will likely produce unintended 
results. We must pay attention to the join 
items across tables, particularly how many 
values match for our joining items across the 
source and target tables. 


Bog Record Table 
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Foreign Keys 


A foreign key is an item in a target table 
that may be used to unambiguously link 10 
rows in another table. Most of our examples 
have used foreign keys, and it is helpful to 
identify them explicitly in tables so that we 
may maintain referential integrity that is, to 
make sure we correctly join among tables 


‘We usually use a primary key or candi- 
dae bey in be source abe o link 1 for 
Эр ley for anoder able Figure 8-19). a 

Hunter ID is the primary key 
Оним Тын. Tog 10 is the primary 
key for the Bog Record Table. Hunter ID isa 
foreign key in the Bog Record Table and 
serves as a link to Hurter Tobie. This 
‘ensures that the specific harvest can be 
traced to the hunter, e.g. a Turkey, Bambi, 
and Rudolph can be unambiguously traced 
back to Mark Trail. 


In another common example, a zip code 
eee el эы аиы ы. 

key fora geography table, with eat 
Iv inthe table associate wi pei 
polygon in a data layer. А zip code may be 
added as a foreign key to another table that 
‘contains economic characteristics for each 


Primary Key, Target Table 
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Here the Hunter ID connects the Hurter Tobi to the Bog Record 


Тобе has a primary key. То 10, as well as a foreign key. Hunter 1D. With apologies to Mel Blane. 
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polygon. We might also have yet another 
table with historical population data foreach. 
Zip code. Zip codes in these tables may serve 
as foreign keys in joins to the polygon table. 


Most joins should involve a one-to-one 
ога one-to-many relationship between the. 
source and the target join items. This is why 
we usually use a primary key in our source 
table as the column for a join, or we use can- 
didate key asa join column. This avoids a 
‘many-to-one relationship between source. 
and target columns, which often causes 
problems. The following paragraphs illus- 
trate these problems. 


A one-to-one relationship means that 
there may be one and only one instance in a 
join item of a target table that matches one 
and only one instance of a join item in а 
source able. The left side of Figure 8-20 
illustrates a one-to-one match foc the items 
181 and 142. Each value of 182 matches only 
‘one value of Idt, Note that not all values of 
1di have a match in 142. 


Tables may also be unambiguously 
joined if there is a one-to-many relationship 
tween the source join item and the target 
table join item. The join on the right side of 
Figure 8-20 shows a one-to-many relation- 
ship between the source item 144, and the 
target item 143. Note there are three 


‘One-to-one, inner Join Result 


18 et 


instances of Yin 143, but they unambigu- 
ously match with tbe one value of Y in 14. 


We often run into problems when we 
attempt joins with items that have a many- 
to-one or a many-to-many relationship. 
These are often considered ill matching. 
keys, in that results from a join can be inde- 
terminate - you can’t predict the results in 
advance, or they may change due to spurious 
factors, such as pseudo random effects of 
Tow ordering. Since you're often not sure of 
the results you'll get, many-to-one or many- 
To-many relationships from the source to tar- 
get keys are rarely а good idea. By requiring 
the source item in а join to be a primary key 
‘or candidate key. Avoid these erroneous 
joins. We use а column or set of columns 
‘that uniquely identifies the rows of the 
source table, so that there is a one-to-one or 
‘one-to-many relationship in our join. 


Figure 8-21 shows an example of an 
тіше join. The item Type in Source 

Table is not a key, and this results in a many- 
{o-many join. There are two rows in the 
Source Table with a Type value of Fr, Both 
‘ows may fairly match the Fr key values 
found in the Torget Tabie, resulting in an 
ambiguous assignment for the values of 
Code for those rows. Both V and N are 
‘equally supported, hence our results are 
‘uncertain, as shown in the Output Table, 


one-to-many. inner jon Result 
— 
ч leon 


T 
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8.21: An example ofa table join. which shouldbe avoided, because it may lead to 
Figur 8.21, Ал compl of a many-to-many ble ji. y 


Such uncertainty is rarely a good thing in 
table manipulations or analyses. We would 
have the same ambiguity if there were only 
опе value of Fr in the Target Table, creating. 
à many-to-one relationship from Source to 
Target. Many-to-one or many-to-many joins 
should be avoided, except in a constrained. 
set of circumstances, 


We typically are very careful about how 
we assign and when we alter the values of 
primary keys, in part because of their impor- 
ance in joins. We must ensure that we don't 
duplicate key values within a column, and 
we have to be careful in how we reassign 
values to the primary key through calcula- 
tions or other modifications. Because null 
values match nothing, or everything, we also 
don't allow them to occur in primary keys. 
Many database systems have checks to avoid 
these errors. For example, many online data- 
bases use an email address аз a primary 
identifier, and they prevent you from regis- 
tering twice under the same email address. 
Many of the errors when using databases 
result from corruption of a primary key. 


Concatenated Keys 


Most examples show keys consisting of 
а single column. These are most common 
because they are easiest to envision, manage, 
and use. This is the idea behind unique iden- 
бегу in many databases, for example, an 
invoice number fora business, unique part 
ID numbers for a warehouse, or a museum 
accession number. These unique IDs allow 
unique items to be simply identified in a 
table. 


‘While single-column identifiers are 
‘most common, we frequently use multiple 
columns as keys. These are often used when 
we have large, multi-table databases that we 
‘wish to combine in several different ways. 
Data from the United States Census, or the. 
U.S. SSURGO database discussed in the 
previous chapters, use multiple tables, many 
With two or more columns used in combina- 
tionasa key. 


When multiple columns are used 
together as a key, it is called a concatenated 
key. Concatenated keys are typically formed 
by two columns, and rarely more than three 
columns. 
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Figure 8-22 illustrates a concatenated 
key, here used to uniquely identify U.S. 
counties, Each U.S. state is assigned a 
‘unique Federal Information Processing Stan- 
dard (FIPS) code, and every county within a 
state assigned a unique code, but unique 
only within the state. This allows new codes 
to be assigned at the state level, for example, 
if a new state is added, or new codes to be 
assigned within a state, as when counties 
were split to form new counties. This also 
allows quick selection of data by state. 
‘County FIPS codes (cFIPS) are assigned 
sequentially as odd numbers within these 
stats, from 1 up to the last county within the 
state. cFIPS alone can not be used as а key. 
for example, FIPS « 1 specifies both Fair- 
field County, Connecticut, and Barnstable 
‘County, Massachusetts. A concatenated key 
using both state and county FIPS number is 
‘needed to uniquely distinguish counties. 
across multiple states. 


Vermont 


state FIPS =9 | state FIPS » 44 


Multi-table Joins 


We may have more than one potential 
key in a table (remember, the key can consist 
‘of one or more columns), but we usually 
design tables with a main key. We may also 
join many tables to a single table, often 
‘using different target items for each join. 


Figure 8-23 shows an example of 
multi able join with distinct keys. School 

Table may be considered the “foundational” 
table, and the two tables named County and 
District are joined to Schoo! Tabte to create 
an Output Tobie. 


Note that these two joins are based on 
different items, County Table is joined to 
‘School Tabie based on values in the columns 
labeled Cry, while the District Table is joined 
to the School Tobie based on the values 
found in the сода labeled Ds Ош 

table is shown here without - 
iS of be colum eg. Cy and байр eath 


Figure 12: A concatenated key here the combined state FIPS (F1PS) and county FIPS codes (FIPS). 
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We may have more than one potential 
key in a table (remember, the key can consist 
‘of one or more columns), but we usually 
design tables with a main key. We may also 
join many tables to a single table, often 
‘using different target items for each join. 


Figure 8-23 shows an example of 
multi able join with distinct keys. School 

Table may be considered the “foundational” 
table, and the two tables named County and 
District are joined to Schoo! Tabte to create 
an Output Tobie. 


Note that these two joins are based on 
different items, County Table is joined to 
‘School Tabie based on values in the columns 
labeled Cry, while the District Table is joined 
to the School Tobie based on the values 
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‘appear only once inthe Output Tobie), keys Cyin the County Tobie uniquely 
although the “copies” are often displayed. identifies each county, and 0:910 in the Dis- 
trict Table uniquely identifies each district, 
кс reor] Ase aes De ны 
cases; one: ‘one valve in ‘Keys, uniquely identifying 
‘source columns may match many row values the source table. ee 


{nthe target columns. Also note that each of 
the join tems in the source tables are 
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Normal Forms in Relational Databases 


Keys and Functional Dependen- 
cies 


‘The previous sections should point out 
the need to carefully structure our tables in a 
relational database, and that keys in tables 
are especially important, Poorly designed 
tables can suffer from serious problems in 
performance, consistency, redundancy, and 
maintenance. Data that are stored in large 
tables may be redundant or with wasted 
space, and long searches may be needed to 
select a small set of records, Ирегез on 
large tables may be slow, and the deletion of 
а record may result in the unintended dele- 
tion of valuable data from the database. 
Smaller, carefully constructed tables are usu- 
ally more useful. 


Consider the data in Figure 1-24, in 
which land records are stored in a single 
table. Attributes include Purcet-ID, Aser- 
mon, Tship-ID, Tship_name, Thal-odd, Own- 
10, Own. nome, and Own. odd. Seme infor- 
mation is stored redundantly, for example, 
changing the Alderman for Tehip 10 12 
‘would require changing many rows: identi- 
fying ай parcels with Yomone as an owner 


Lond Records table, unnormalized form 


would require a search ofall records for sev- 
eral columns in the table. This storage 

indancy is costly both because it takes up 
disk space and because each extra record 
adds to the searct and access times. А sec- 
ond problem comes with changes in the data. 
For example, if Devin, Yorone. and 
Prestovic sell the parcel they jointly own 
(first data row), deleting the parcel record 
for Devin would purge the database of her 
address and tax payment history. If these 
data on Devin were required later, they 
‘would have to be reentered from an external 
source. 


‘We may place relational databases in. 
normal forms to avoid many of these prob- 
Jems. Data are stmctured in sequentially 

igher normal forms to improve correctness. 
consistency, simplicity, nonredundancy, and 
stability. There ме several levels in the hier- 
archy of normal forms, but the first three 
levels, known аз е first through third nor- 
mal forms, are most common. Data are usu- 
ally structured sequentially, that is, first all 
tables are converted to first normal forms, 
then converted to second and then third nor- 
‘mal forms as needed. Prior to describing 
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it actos the page. 


8.24: Land records data in uznormalized form. The table is shown in тко parts because it is too 


normal forms we must introduce some ter- 
minology and properties of relational tables. 


As noted earlier, relational tables use 
keys to connect data. Any column(s) that 
uniquely define(s) the rows of a table is 
called a candidate key. The primary key for 
indexing a table is chosen from the set of 
candidate keys, using a single-column key if 
possible. We typically manage the chosen 
key such that it will always be valid as a pri- 
mary key for the table. The Porcel-10 isa 
key for the table in Figure 8-24. 
‘uniquely identifying each row in the table. 


Functional dependency is another. 
important concept. Attributes are function- 
ally dependent if ata given 
each value of tbe d 
determined by a value of another attribute. 


Figure 8-25 illustrates the concept of 
functional dependency. The table contains a 
parts list, with ID as the primary key, and a 
рап Nome, CNum, Суре, Thread, and Angle 
attributes. The 1D is unique for each row, and 
so by definition, all other items are function- 
ally dependent on 10. If we know the value. 
cof 10 is 1, then we know the part Nome is 
Tec. We denote this as shown, 
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We also see that Nome is functionally 
dependent on CNum. If we know a value for 
Chum, say, 2, we know the value of Nome 
‘will equal £9. We see that the converse is 
also true bere, CNum is also functionally 
dependent on Nome. Note that this is not 
always true, as shown for Ctype and Thread. 


Стуре -> Thread is tue, 
but 
Thread -> CType is not true 


Why? Because for the value of Threod 
‘equal to 14, CType may be either E or Er, vio- 
lating our definition of functional depen- 
dence. 


In our example in Figure 8-24, we 
kaw tat Gur only Sopa 
‘deat on Own nome. In other words, each 
‘owner can only have one resident address, 
ер. we may not allow the entry of a second 
resident address. Therefore, fora given 
Own.nome. for Prestovic the 
Own_oddis determined. In a similar way, 
there is only one Township name, 
Tship_nome, for each Town Hall address, 
Thol-odó, or 


Own nome -> Own. odd 


10.» Name 
Tship.nome -> Thall-add 

10 [ Nome 

1 | Te 

2 Сор 

3 Ext 

4 Сор 

5 Tec 

[6 | cp 

7 Ext 


Functional Dependencies 


1D -> Nome, CNum, Ctype, Thread, Angle 
Смит -> Name (or Nome -> CNum) 


СТуре -> Thread 


Figure $25: Example of functional dependencies. 
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Remember, these indicate that Our, odd is 
functionally dependent on Own. nore, and 
‘Thal-oddis functionally dependent on 
Tehip. nome. We must always bear in mind 
that this functional dependency is something 
we enforce. Unless we place safeguards 
during data entry and manipulation, we may 
change data so that we "break" the func- 
tional dependency, for example, by adding а 
second owner address for an owner name. 


Functional dependencies are transitive, 
soif A -> B. and B -> C, then A -> C. This 
notation means that if is functionally 
dependent on A, and С is functionally depen- 
dent on В, then is functionally dependent 
po 


While relational database designs are 
flexible, the use of keys and functional 
places restrictions on rela- 


"There cannot be repeated records, that 
is, there can be no two or more rows 
where ай attributes are equal. 

"There must be a primary key in a table. 
This key allows each record to be 
uniquely identified. 

“No member of a column that forms 
part of the primary key can have a null 
value, This would allow multiple. 
records which could not be uniquely 
identified by the primary key. 


The First and Second Normal 
Forms 


We begin creating tables in normal 
forms by first gathering all our data, often in 
а single table. Normal forms typically result 
їп many compact, linked tables, so it is quite 
common to split tables as the database is 
normalized, or placed in normal forms. After 
‘normalization, the tables have an indexing 
system that speeds searches and isolates val- 
wes for updating. 


Tables with repeat groupings, as in the 
table at the top of Figure 8-26, are umor- 
malized. A repeating group exists in a rela- 
tional table when an atribute is allowed to 
have more than one value represented within 
а row. Owner-ID repeats itself for dwellings 
with multiple owners. 


A table is in first normal form when 
there are no repeat columns. The Lond 
Records table at the bottom of Figure 8-26 
has been normalized by placing each owner. 
into separate row. This is à table inthe first 
‘normal form (INF) because each column. 
appears only once in the table definition. A 
INF is the most basic level of table normal- 
ization. However, the INF table structure. 
still suffers from excessive storage redun- 
dancy, inefficient searches, and potential 
loss of data on updating. First normal forms. 
have an advantage over unnormalized tables 
because queries are easier to code and imple- 
ment. Tables in INF are usually converted to 
higher-order normal forms, usually to at 
least third normal form, 3NF, but it is useful 
o understand second normal forms before 
describing 3NF tables. 


A table is in second normal form (2NF) 
{fit is in first normal form and every non-key 
attribute is functionally dependent only on 
the primary key, or on transitive functional 
dependencies ofthe primary key. Remember. 
that functional dependency means that 
knowing the value for one attribute of a 
record automatically specifies the value for 
the functionally dependent attribute. The 
non-key attributes may be directly depen- 


Land Records table, unnormalized form 
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Lond Records table, first normal form (INF) 
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dent on the primary key through some func- 
tional dependency. or they may be 
dependent through a transitive dependency. 
Tie Lond Recor Selle in INF af ha иа 
of Figure 8-26 has only one possible primary 
Key. the composite of Porcei-1D and Own-10. 
No other combination uniquely identifies 
each row. However, this table is not in sec- 
ond normal form because it has non-key 
attributes that are not functionally dependent 
only on the primary key attributes. For 
example, Tship. norme and ThalLodd are 
functionally dependent on Tship-10. 


‘The Land Records able at the bottom of 
Figure 8-26 is repeated at the top of Figure 
3-27. This table exhibits the primary disad- 
vantages of the first normal form. Рогсе-10, 
Alderman, and Tship-10 are duplicated when. 
there are multiple owners of a parcel, caus- 
ing burdensome data ; Each time 
these records are updated, for example when. 
a new Alderman is elected, data must be. 
changed for each duplicate record. If a par- 


Figure 8-26 Relational table ia wnnormalzed (ор) and firt normal forms (bonom) 


cel changes hands and the seller does not 
‘own another parcel represented in the able, 
then information on the seller is lost. 


‘Some of these disadvantages can be 
removed by converting the first normal form 
table toa group of second normal form. 
tables. To create second normal form tables, 
‘we make every non-key attribute fully 
dependent on a primary key in the new 
tables. Note that the INF table will often be 
‘split into two or more tables when convert- 
ing to 2NF, and each new table will have its 
‘own key. Any non-key attributes in the new 
tables will be dependent on the primary 
keys. The bottom of Figure 8-27 shows our 
Land Records converted to second normal 
form, Each of the three tables in second nor- 
‘mal form isolates an observed functional 
dependency. so each table and dependency 
will be described in tum. 


How do we systematically apply this 
2NF criterion, that the non-Key attributes be 
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functionally dependent only on the primary 
кеу, directly or through a transitive fanc- 
tional dependency? We must 1) specify the 
primary key, 2) identify the main functional 
dependencies, and 3) project the INF table 
across the key and dependency columns. 


First, we must identify the primary key. 
In our example here, the simplest primary 
key is the (concatenated) key that is the com- 
bination of Porcel-10 and Owner-10. If our 
primary key is a single item, then the able is 


already in INF by definition, because all 
non-key attributes will depend on the pri- 
mary key. However, if our primary key is 
more than one column, we may have further 
work to convert to 2NF, focusing on depen- 
dence on the components of the primary key. 


‘Our second step is to identify the func- 
tional dependencies. We know that parcels 
occur in only one township, and that each 
Township has a unique Tship-10, a unique 
Tship_nome, a unique TholLodd, and one 


Land records table, first normal form (1NF) 
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Given functional dependencies: 


Parcel-10 —e- Alderman, Tship-1D 
Tship-IO—e-Tshp. nome, Thal-odz 
Own-ID—e- Own_name, Own. add 


Land records tables, second normal form (2NF) 
Land Records Table 1 
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Lond Records Table 2 
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Figure 8-27: Ownership диз. converted to sesond normal frm. 


Lond Records Table 3 


Alderman. This means that if we have identi- 
fied a parcel by its Porcel-ID the Alderman. 
Tship-IO, Tship nome, and TholL odd are 
Known. We assign a unique identifier to each, 
parcel of land, and the Alderman, 

Tship. nome, and Tholl_odd are ай dependent 
on this identifier. If we know the parcel iden- 
йет we know these remaining values. This 
is the definition of functional dependency. 
We represent these functional dependencies 
by: 


Porcel-ID -> Alderman 
Parcel-ID -> Tship-1D 
Porcel-ID -> Tship. nome. 
Porcel-ID -> Той ода 


These functional cies are 
incorporated in a new table named Lond 
Records 1 in Figure 8-27. 


Second, note that once Own-10 is speci- 
fied, the Own. nome and Own. od are deter- 
mined. Each owner Das a unique identifier 
and only one name (aliases not allowed). 
Also, each owner has only one permanent 
home address. Own. nome and Own. odd are 
functionally dependent on Own-1D. The 
functional dependencies are: 


Own-ID -> Own. nome. 
Own-ID -> Own odd 


‘The Porcel-IO and Own-10 are called 
partial functional dependencies, because 
while both are dependent on the primary 
кеу, they aren't dependent on each other. fT 
have a unique Percel-I0, I know additional 
information about some of the columns for 
any row in the table, but aot all of tbe col- 
umns. I{T know the Own-1D, I also know the 
values of a set of columns, but again. not all. 
‘When we have a concatenated key. we must 
identify these in our data, and they guide us 
in how to further split ош table. 


How do we get to 2NF? By projecting 
the INF table across the primary key and 
functional dependencies. Remember, proj- 
ect is just a way of saying we subset the col- 
umns, here guided by the functional 
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dependencies. These partial functional 
dependencies are represented in the tables 
Lond Records 1 and Lond Records 2 in Fig- 
‘ure 8-27. 


Finally. note that we need to tie the own- 
ers to the parcels. These relationships are 
presented in the table Lond Records 3 in Fig- 
‘ure 827. Note that some parcels are jointly 
‘owned, and so there are multiple owner IDs 
for some parcels. 


‘The three tables Lond Records 1 through 
З satisfy the conditions of a second normal 
form. Second normal form eliminates some 
ofthe redundancies associated with the INF. 
Note that the redundancy in storing the 
information on Alderman, Tship-ID, 

Тар. лоте, and TholLodd have been sig- 
nificantly reduced, and the minor redun- 
dancy in Own. nome has also been removed. 
Editing the tables becomes easier. for exam- 
ple, changes in Alderman entail modifying 
fewer records. Finally, deletion of a parcel 
‘does not have the side effect of deleting the. 
information on the owner, Own-ID. 

Own. nome, and Own.odd 


The Third Normal Form 


‘The 2NF still contains problems, 
although they are small compared to a table 
in INF. They can still suffer from transitive 
functional dependencies. If a transitive func- 
tional exists ш a table, then 
there is a chain of dependencies. transitive 
‘dependency occurs in our example table 
named Lond Records 1 (Figure 8-27). Note 
that Parcel-ID specifies Tship-ID, and Tship- 
1D specifies Tship_name, TholL add and 
Alderman. In our notation of functional 
dependencies: 

Porcel-ID - > Tship-ID 

and 


Tetip-ID -> Tship_name, ThalLodd, 
Alderman 


This causes a problem when we delete a 
parcel from the database. To delete a parcel 
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we remove the parcel from tables Land 
Records 1 and Land Records 3. In so doing, 
we might also lose the relationship among. 
Tship-ID, Tship_name, ТҺой odd. and Alder- 
mon. To avoid these problems we need to 


However, the table Lond Records 1 in 
Figure 8-27 is not in 3NF because the func- 
tional dependencies for table Land Records 1 
are: 


convert the tables to the third normal form. Porcel-ID - > Tship-ID 
уке pun Tstip-ID > Та. nome. тпай од 

if and only if for every functional depen- dig 

dency A > B. Ais a super key, or Bis amem- — Tshp-IDis nota super key for the table 

ber of a candidate key. This requirement (it does not uniquely identify the row), пог 


‘means we must identify transitive functional 
dependencies and remove them, typically by 
splitting the table that contains them. The 
tables Lond Records 2 and Land Records 3 
in Figure 8-27 are already in the 3NF, 
because the keys for these tables are super 
keys. Owner-10 uniquely identifies the rest 
of the row in Land Records 2, and the con- 
catenated key of Porcel-1O and Tship-1D are 
the rows in Land Records 3. 


are Tship_nome and Tholl odd members of a 
primary candidate key for that table. Remov- 
ing the transitive functional dependency by 
splitting the table will create two new tables, 
each of which satisfies the criteria for the. 
ЭХЕ. Figure 8-28 contains the tables Lond 
Records 1o and Lond Records 1b, both of 
which now satisfy the 3NF criteria, and pre- 
serve the information contained in the INF 
table in Figure 8-26, Note that Porcel-10 is 
now a super key for Table 1o and Tship-ID is 


Land records, third normal form 


Lond Records 10 
FO Parcel-IO -= Тоо 


Lond Records 1b 
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Lond Records 2 
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Lond Records 3 
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а super key for Tobie 1b, so the 3NF criteria 
are satisfied. 


А general goal in defining а relational 
database structure is to have the fewest 
tables possible that contain the important 
relationships and to have all tables in at least 
ЭХЕ. Normal forms higher than three have 
been described and provide further advan- 
tages: however, these higher forms are often 
more limited in their application and depend 
on the intended use of the database. 


While relational tables in normal forms 
have certain useful characteristics, and rela- 
tional databases are the standard tool 
adopted by most large organizations, they 
may suffer from relatively long access times 
for specific queries. Databases may be 
EE 
‘most common processes, particularly when 
databases are quite large and high speed 
access is paramount. These denormaliza- 
tions typically add extra columns or perma- 
nent joins to the database structure. This. 
may add redundancy or move a table to a 
lower normal form, but these disadvantages. 
often allow significant gains in processing 
speed, The need o denormalize tables has 

iminished with improvements in computing. 
power. However, denormalization may be 
required for extremely large databases, or 
where access speed is of primary impor- 
tance. 


Summary 


Attribute data are an important compo- 
nent of spatial data in a GIS. These data may 
be organized in several ways, but data struc- 
tures that use relational tables have become 
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"he most common method for organizing and 
manipulating attribute data in GIS. 


‘Selections, or queries, are among the 
most common analyses conducted on attri- 
bute data. Queries mark a subset of records 
ina table, often as a precursor to subsequent 
analyses. Queries may пзе AND, OR, and 
NOT operations, among others, alone or in 
‘combination. 


Keys are in structuring rela- 
tional data tables. Primary and foreign keys 
are defined which are used in joining tables. 
Primary keys may sometimes be concate- 
nated, or formed from several columns, 
rather than just from one column. Entry and 
manipulation of key values is often con- 
strained so that tables may Function properly 


Relational tables are often placed in nor- 
mal forms to improve correctness and con- 
sistency, to remove redundancy, and to ease. 
updates. Normal forms seek to break large 
tables into small tables that contain simple. 
functional dependencies. This significantly 
improves the maintenance and integrity of 
the database. Normal forms may cause some 
‘cost in speed of access, although this is a 
diminishing problem as computer hardware 
improves, 


Object-relational database systems have 
been developed that incorporate the strong 
typing and domains of object-oriented mod- 
els with the flexibility. logic, and ubiquity of 
relational data models. These evolutionary 
improvements to the relational approach will 
‘continue as database technologies are. 
‘extended across networks of computers and 
the World Wide Web. 
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Study Questions 
8.1 - What are the main components of a database management system? 
82 - What are the primary functions of a database management system? 
83 - Describe the difference between single and multiple user views. 
ВА - What is a one-to-one relationship between tables? A many-to-one relationship? 
8.5 - Which single columns in the following table may serve as keys? 


PID | оге | се [umr | зат 
1]e | «|1 [ss 
aje[we[2[s 
[л [е [а [ss 
?[с[=[з][% 
sje[«e[:]e 

w[x|w|s]e 


8.6 - Which single columns in the following table may serve as keys? 


iD [ismo] ose | се [wor [ хет 
1 [w| e |a| 1 | 10 
3 [ual o |y | 5 [гю 
s [ual a | CR ERE] 
7 [e| ¢ [ ЕСЕТ 
o [an| F |u | o [n0 
ма |2259| н | w | 2 [гю 


8.7 - Why have relational database structures proven so popular? 


8.8 - What are the eight basic operations formally defined by E.F. Codd for the rela- 
tional model? 


8.9 - What is the primary reason that hybrid database models are used for spatial 
data? 
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8.10 Does an OR condition result in more, fewer, or the same number of records. 
than the component parts? For example, isthe set from: 


condition A OR condition B 


‘the same, bigger, or smaller than the set from condition A alone, or condition B 
alone? 


3.11 - Does an AND condition result in more, fewer, or the same numberof records 
аз the component parts? For example, is the set from: 
condition A AND condition B 
the same, bigger, or smaller than the set from condition A alone, or condition B 
alone? 
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3.12 - Identify the states meeting each of the following selection criteria, based on the 
table below. Note that these data are for the 1990s, and may not reflect conditions 
today: 

a) Smokers < 20% 

b) Smokers» 20% and illiteracy < 10 

€) Not (non-federal taxes > 9) 

d) literacy < 7 or income > 22,000 

ıe) Get more federal aid than paid in taxes, and non-federal taxes > 9 

1) [Firearms deaths < 10 and income > 21,000 ] and not {smokers > 20] 

Frearm 


Smokers Income  Iwreracy воа Non-Federü Fed Aid / 
FIPS Nome (X) (рези)  (&) _ 100000 Tex Rote (x) _ Fed Taxes 


ol | Nasoma | 222] mue | ı5 СЕЕ 1л 
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ı2 [rows [us | ms | го m 7 шг 
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3.13 - Identify the countries from the following table (data from the 1990s) that mest 
the following criteria. 


a) Per capita energy use > 4,000 and population < 20,000,000 
b) Infant mortality < 7 and life expectancy > 79.0 

€) Per capita energy use < 4,000 or [(population > 40 million) and (car theft < 1)] 
d) [Per capita energy use < 4,000 or (population > 40 milli) and (car theft < 1) 
е) not (population > 40,000,000) 

1) Population < 20,000,000 and not (car theft > 1.5) 


Infont ше Cor 
Population Energy Use Mortality expect. Theft 

Country _ (milons) (Ио/ре) (per 1000) _ (years) (x) 
Australi 199 | see 4 792 22 
Britain sea | 5945 5 75 26 
Finland 52 | вазе 4 780 05 
France 597 | 4350 4 792 18 
Japon 1272 m 3 816 o1 
Netherlands 162 | 5993 5 783 05 
Norway 46 | 6019 4 789 15 
South Africa 453 | 3703 52 465 24 
Span ап | 295 5 783 05 
USA 2910 | 8006 7 73 | 05 


‘What are normal forms in relational databases? Why are they used, and what 
are the advantages of putting data in higher normal forms? 
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8.15- Sketch the output table resulting from an inner join shown below: 


TM 
idi po: Id? tm 
y [wo] x[s 
[2 [=] [yfi] 
Ale] [ale 
у |е] Гаја 
v|m| [nts 
R|rt Lle 
a [m 


8.16 - Sketch an outer join for the table shown in the previous problem. 
8.17 - Define the basic differences between first, second, and third normal forms, 
3.18- Give an example of a functional dependency. 


362 GIS Fundamentals 


8.19 - List the single-column functional dependencies in the following table, usi 
the arrow notation described in this chapter. g "t 


D Size Shope [Color | Age Source 
1 large round blue 10 А 

2 medium | round | green 5 B 

3 ‘small round red 10 c 

4 medium | knobbed | green 5 D 

5 medium | knobbed | green 5 E 

6 Torge round 10 F 

7 Torge round 10 А 


5 
о|т|т|о|с|®|>» 


9 Basic Spatial Analyses 


Introduction 


rial data analysis is the applica- 
toa operations to Comedian at 
related attribute data, often to solve a 
problem. We may wish to identify high 
crime areas, to identify road segments that 
need repoving. or to find the best area for 
‘wind turbines. There are hundreds of spa- 
tial operations ot spatial functions used in 
tial analysis, and all involve calcula- 


Spatial operations are often applied 
sequentially to solve a problem. the out- 
ри of each spatial operation serving as 
the input of the next (Figure 9-1). Part of 
the challenge in geographic analysis is 
selecting appropriate spatial operations. 
and applying them in the appropriate 
order. 


‘The table manipulations we described 
in Chapter 8 are included in our definition 
of a spatial operation. Indeed, the selec- 
tion and modification of attribute data in 
spatial data layers are included at some 
time in nearby all complex spatial analy- 


Ses. Many operations incorporate both the 
attribute and coordinate data, and the 
attributes must be further selected and 
‘modified in the course of a spatial analy- 
sis. 

The discussion in the present chapter 
will expand on rather than repeat the. 
selection operations treated in Chapter 8. 
This chapter describes spatial data analy- 
ses that involve sort, selection, classifica- 
tion, and spatial operations that are 
applied to both coordinate and associated 
attribute data. 


Input, Operations, and Output 

Spatial data analysis typically 
involves using data from one or more lay- 
ers to create output. The analysis may 
consist of a single operation applied to a 
data layer, or many operations that inte- 
grate input data from many layers to cre- 
ate the desired output. 
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Spatial operations may also require 
several inputs or generate many outputs. 
Terrain analysis functions may take a raster. 
grid of elevations as an input data layer and 
produce both slope (local steepness) and 
aspect (the slope direction. In this сазе. 
‘wo outputs are generated for each input 
elevation data set. Averaging may use mul- 
tiple input layers to produce a single output 
layer, for example, annual rainfall found by 
‘summing 12 monthly rainfall raster layers. 

The output from a spatial operation 
may be spatial, creating new spatial data 
layers, or nonspatial, producing scalar val- 
ues or a table. A layer average function 
may simply calculate the mean cell value 
found in a raster data layer. The input is a 
spatial data layer, but the output is a single 
number. 


Scope 

Spatial data operations may be сһагас- 
terized by their spatial scope, the extent of 
area of the input data that are used in deter- 
mining the values at output locations (Fig- 
ure 9-2). Spatial operations may be 
characterized as local, neighborhood. ог 
global, to reflect the extent of the input area 
used to determine the value at a given out- 
put location. 

Local operations use only the data at 
‘one input location to determine the value at 
that same output location (Figure 9-2, top). 
Attributes or values at adjacent locations 
are not used in the operation. 

Neighborhood operations use data 
from both an input location plus nearby 
locations to determine the output value 
(Figure 9-2, center). The extent and relative. 
importance of values in the nearby region 
may vary, but the value at an output loca- 
tion is influenced by more than just the 
value of data found at the corresponding 
input location. 


Global operations use data values from 
the entire input layer to determine each out- 
put value. The value at each location 
depends in pan оп the values at all input 
locations (Figure 9-2, bottom). 

‘The set of available spatial operations 
depends on the data model and type of 
input spatial data. Some operations may be 
easily applied to raster or vector data, 
While the details of the specific implemen- 
tation may change. the concept of the oper- 
ation does not. Other operations may be 
possible in only one data model. 

Data model characteristics will deter- 
mine how any given operation is applied. 
The specific implementation of many oper- 
sions for example, mubilayer айа 
depends on the specific data model. А ras- 
ter operation may produce a different out- 
come than a vector operation, even if the 
themes are meant to represent the same fea- 
tures. In a like manner, the specific set and 
sequence of operations in a spatial analysis 
will depend on the data model used and the 
specific operations available in the GIS 
software, 

Spatial scope provides a good example 
of this influence of data models. Cells ina 
raster data set have uniform size and shape. 
A local operation applied to а raster data 
layer has a well-defined, repeatable area. In 
contrast, poly gons usually vary in size and 
shape. A local operation for a vector poly- 
gon data set is likely to have variable size 
and shape from one location to another. In 
Figure 9-2. the local operation follows a 
state boundary. Therefore, the operat 
applies toa different size and shape for 
each state. 

Neighborhood analyses are affected by 
the shape of adjacent states in а similar 
‘manner. Summary values such as popula- 
tions of adjacent states may be greatly 
influenced by changes in neighborhood 
size, so great care must be taken when 
interpreting the results of a spatial opera- 
tion. Knowledge of the algorithm behind 
the operation is the best aid to interpreting 
the results. 


tions are quite easy to program when using models and apply the desired operations 
raster data models, and can be quite diffi- and, if necessary, convert the results back 
‘cult when using vector data models. The 10 the original data model. 
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Selection and Classification 


Selection operations identify features 
‘that meet one or more conditions or crite- 
ria. In these operations, attributes or geom- 
etry of features are checked against criteria, 
and those that satisfy the criteria are 
selected. These selected features may then 
be written to a new output data layer, or the 
‘geometry or attribute data may be manipu- 
lated in some manner. 

Figure 9-3 shows an example of a 
selection operation that involves the attri- 
butes of a spatial data set. Two conditions 
are applied, and the features that satisfy 
both conditions are included in the selected 
Set This example shows the selection of 
those states in the “lower 48” United States 
that are a) entirely north of Arkansas, and 
b) have an area greater than 84,000 km". 
‘Unselected states and a shaded Arkansas. 
are shown at the top of the figure. The next 
two maps of Figure 9-3 show those states 
that match the individual criteria those. 
states entirely north of Arkansas, and those 
states that are greater than 84,000 km-. The 
‘bottom panel shows those states that satisfy 
both conditions. This figure illustrates two 
basic characteristics of selection opera- 
tions, First, there is a set of features that are. 
candidates for selection, and second, these. 
features are selected based singly or on 
some combination of the geographic and 
attribute daa. 

‘An on-screen query is a common mode 
of selection. A data layer is displayed and 
features are selected by a human operator, 
‘often with a mouse click or other pointing 
device. On-screen query is used to gather 
information about specifi features, and is 
often used for interactive updates of attri- 
bute or spatial data. For example, it is com- 
mon to set up a process such that when а 
feature is selected, the attribute information 
for the feature is displayed. These attribute 
data may then be edited and the changes 
saved. 


Queries may also be specified by 
applying conditions solely to the aspatial 
‘components of spatial data, as described in 
Chapter S. Selection subsets ће data to 
records of interest, and the selected data are 
typically processed in some way, often 
saved toa separate file, deleted, or changed 
in some manner. 


Selection operations on tables were 
described in general in Chapter 8. The 
description here expands on that informa- 
tion and draws attention to specific charac- 
teristics of selections applied to spatially 
related data. Table selections have spatial 
relevance because each record in а table is 
associated with a geographic feature. 
Selecting a record in a table simultaneously 
selects the associated spatial features: cells, 
points, lines, or areas. Spatial selections 
‘may be combined with table selections to 
identify а set of selected geographic fea- 
tures, 


Set Algebra 

Selection conditions are often formal- 
ized using set algebra. Set algebra uses the 

less than (©. greater than 0). 

equal to (2) and not equal to (< >). These 
selection conditions may be applied either 
alone or in combination to select features. 
from a set. 


Figure 9-4 shows four set algebraic. 
expressions and the selection results for a 
set of counties in the northeastem United 
States. The upper two selections show 
equal to (2) and not equal to (< >) selec- 
tions. The upper left shows all counties 
with a value for the attribute state that 
equals Vermont, while the upper right 
shows all counties with a value for state 
that are not equal to New York. The lower 
selections in Figure 9-4 show examples of 
ordinal comparisons. The left figure shows 
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States entirely 
north of 
Arkansas 


States larger 
‘than 84,000 sq. km. 


States both 
entirely north of 
Arkansas 


and 
larger than 84,000 sq. km. 


Figure 9.3: An example of a selection operation based on single or lile conditions- 
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all counties with a population > 1000 
tas pr e oe 
ith a density less than (<) 250 persons per 
mit, 


‘The set algebra operations greater than 
(5) or less than (<) may not be applied lo 
nominal data, because there is no implied 
order in nominal data. Green is not greater 
than yellow, and rod is nor less than е. 
Ошу the set algebra s equal to (= 
dot د‎ (o) Egg Toe nom 
nal variables. АП set algebra operations. 
may be applied to ordinal data, and аге 
often applied to interval/ratio data. 


Boolean Algebra 


Boolean algebra uses the conditions 
‘on, AND, and NOT to select features. Boolean. 
expressions are most often used to combine 
set algebra conditions and create com- 
pound selections. The Boolean expression 
consists ofa set of Boolean operators, vari- 
ables, and perhaps constants or scalar val- 
те. 

Boolean expressions are evaluated by 
assigning an outcome. true or false, to each. 
condition. Figure 9-5 shows three exam- 
ples of Boolean expressions. The first is an 
expression using a Boolean AND, with two 
arguments for the expression. The first 


state <> New York 


population < 250 
density © pers /sa mi 


арален specifies a condition on a vari- 
oreo, and the second argument 
a condition on a variable named formin- 
come. Features are selected if they satisfy 
both arguments, that is, if their orea is 
larger than 100000 AND fermincome is less 
than 10 bilen. 

Expression 2 in Figure 9-5 illustrates a 
Boolean нот expression. This condition 
specifies that all features with a variable 
state which is not equal to Texas will remm. 
A tro ils and hence be selected. or is 
also often known as the negation operator. 
This is because we might interpret the 


Boolean expressions 
1. (area > 100,000) 

AND 

(farm. income < 10 billion) 


2. NOT (state = Texas) 
3. [rainfall > 1,000) 
AND 
{taxes = low) } 
NOT (crime = high) 


Figure 9-5 Examples of Boolean expressions 


application of a NOT operation as exchang- 
ing the selected set for the unselected set. 
‘The argument of expression 2 in Figure 9-5 
is itself a set algebra expression. When 
applied toa set of features, this expression. 
will select all features for which the vari- 
able state is equal to the value Tews. The 
NOT operation reverses this, and selects all 
features for which the variable store is not 
equal to Texas. 

The third in Figure 9.5 
shows a compound Boolean expression, 
combining four set algebra expressions 
‘with AND, oR, and мот. This example shows. 
‘what might be a naive attempt to select 
areas for retirement for someone interested 
in areas that have high rainfall and low 
taxes (a gardener on а fixed income), or 
low housing cost and low crime. 

The spatial outcomes of specific Bool- 
E are shown in Figure 9-6. 
The figure shows three. cireu- 
lar regions, labeled A, B, and C. Areas may 
fall in more than one region: for example, 
the center, where all three regions overlap, 
is in A, 8, and C. As shown in the figure, 
Boolean AND, О, or NOT may be used 19 
select any combination or portions of these 
regions, 


э 
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оз conditions retum a value of true if 
either argument is true. Areas in either 
region A or region В are selected at the top. 
‘center of Figure 9-6. ANO requires the con- 
ditions on both sides of the operation be 
met; an AND operation results in a reduced 
selection set (top right, Figure 9-6). NOT is 
the negation operator, it fips the effect of 
the previous operations; it tums true to 
false and false to true. The NOT shown in 
the lower left portion of Figure 9-6 returns. 
the area that is only in region c. Note that 
this is the converse, or opposite set that is 
Tetumed when using the comparative оя, 
shown in the top center of Figure 9-6. The 
nor operation is often applied in combina- 
tion with the Avo Boolean operator. as 
shown atthe bottom center of Figure 9-6. 

this selects the converse (or com- 
plement) of the corresponding AND. Com- 

pare the bottom center selection to the top 
fight selection in Figure -6. Ns Atos 
and oss may be further combined to select 
specific combinations of areas, as shown in 
the lower right of Figure 9-6. 

Nowe that as with ble selection dis- 
‘cussed in Chapter 8, the order of applica- 
ion of these Boolean operations ЫН 
important. In most cases, you will not 
select the same set when applying the oper- 
ations in а different order 
brackets, or other delimiters should be used 
to specify the order of application. The 


anoa 


morca wo B. (Ano B)OR C 


Q ® 
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(County = Rice) 
AND 
(Wshed = Cannon) 


‘expression А AND B OR c will give different 
results when interpreted as (A AND B) OR с, 
as shown in Figure 9-6, than when inter- 
preted as A AND (B OR C). Verify this as an 
exercise, Which areas does the second 
Boolean expression select? 

Figure 9-7 shows a real-world example 
ofa Boolean selection. Counties often must. 
identify areas for pollution reduction, in 
this case a portion of the Cannon River, a 
tributary of the Mississippi River. Counties 
are labeled, with boundaries shown as solid 
lines. The Cannon River watershed is 
shown as thicker lines. A Boolean ANO 
‘operation was applied to a data layer con- 
taining both watershed and county bound- 
aries, selecting the areas that are both 
within the Cannon River watershed and 
Within Rice County. 


‘Spatial Selection Operations 

Many spatial operations select sets of 
features. These operations are applied to a 
spatial data layer and remm a set of fea- 
tures that meet a specified condition. Adja- 
cency and containment are commonly used 
spatial selection operations. 


jacency selection operati 
vedo Sealy oot Roco tat “touch” 
other features. Features are typically con- 
‘sidered to touch when they share a bound- 
чыын ыган 


target or key set of features is 
identified, and all features that share a 
boundary with the target features are 
placed in the selected set, 


Figure 9-86 shows an example of 
selection based on polygon adjacency. The 
state of Missouri is shaded on the 

of Figure 9-80, and states adjacent to Mis- 
souri are shaded on the right portion of Fig- 
ure 9-80, States are selected because they 

include a common border with Missouri. 


There are many ways the shared border 
may be detected, With a raster data layer, 
an exhaustive cell-by-cell comparison may 
be conducted to identify adjacent pairs with 
different state values. Vector adjacency 
may be identified by ‘the topo- 
logical relationships (see Chapter 2 for a 
discussion of topology). Line and polygon. 
topology typically records the polygon. 
identifiers on each side of a line. АЙ lines 


with Missouri on one side and a different 
state on the other side may be flagged, and 
the list of states adjacent fo Missouri 
extracted. 


Adjacency is defined in Figure 9-80 as 
sharing a boundary for some distance 
greater than zero. Figure 9-8 shows how a 
different definition of adjacency may affect 
selection. The left of Figure 9-8» shows the 
state of Arizona and a set of adjacent 
(shaded) westem states. By the definition 
of adjacency used in Figure 9-82, Arizona 
and Colorado are not adjacent, because. 
they do not share a boundary along a line 
segment. Arizona and Colorado share a 
border ata point. called Four Comers, 
Where they join with Utah and New Mex- 
ico, When a different definition of adja- 
cency is used, with a shared node 
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qualifying as adjacent, then Colorado is 
added to the selected adjacent set (right, 
Figure 9-8:). This is another illustration of. 
an observation made earlier; there are often 
several variations of any single spatial 
‘operation. Care must be taken to tes the. 
‘operation under controlled conditions until 
the specific implementation of a spatial 
‘operation is well understood. 
Features may be selected based on 
proximity. Proximity selection typically 
requires а set of selecting features and a set 
‘of target features. АП the target features. 
within a specified distance of the selecting 
features will be chosen. e.g., all weather 
stations within 60 km of a Watershed may 
be selected to estimate rainfall (Figure 9- 
9). Selecting only stations within the water- 
shed may provide poor estimates of rainfall 


a) 
b) 
Adjacency Adjocency 
> shared ine required Shared node or line required 
Colorado Colorado 
ш. 'Аптола| 
24 ape kins ыза nn) Mion USA chown he da амы 


seperate Mimoun chon othe ae) 
accepted (gh) 


ЕА 


cult different elections 
‘ba when node adjacency 1t 
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near the edge of the polygon, but using all 
‘gages is inefficient because information 
from gages very far away often isn’t help- 
ful. An adequate exterior proximity is cho- 
зеп, usually by visual inspection, previous 
experience, or preliminary tests, and the. 
proximity selection applied. Proximity is 
usually calculated implicitly within the 
‘operation, in that first the source features 
and selectable features are provided, a 
proximity distance and method specified, 
‘and on running the operation. the selected 
setis identified, Less frequently, software. 
requires a multi-step process in which the 
source features and selection layers are 
identified, and a selection layer is created. 
This created layer usually contains poly- 
gons defining the area within the specified 
distance of the source features. This poly- 
jon layer is then used to select features 
from the target set. 

There are several variants of proximity 
selection. One variant selects all features 
that are at least partially within a given dis- 
tance of a set of features. Another variant 
selects only features that are entirely inside 
a given distance of the outer boundaries of 
a set of polygons. A third variant selects 
only those features that are entirely within 


a given distance of a set of polygons, but 
not those that are within the set of polygons 
themselves. Users should clearly identify 
the selection tool function and outputs. 

Proximity selection, and most selec- 
tion processes more generally, typically 
only select features that meet a given set of 
criteria. The process often does not create a 
separate, new data layer of the selected fea- 
tures. Selected features are marked on- 
Screen and in the corresponding data tab 
but still are part of the source data set, 
There is often an additional export step if 
these selected features are destined for a 
new data set. This explicit creation 
approach is usually adopted because selec- 
tion is often а multi-step with vari 
ооз different selection tools applied 
successively to arrive at a target set of fen- 
tures. 

We often add indicator or classification 
‘variables to our data tables when we have a 
complex set of election criteria, particu 
larly when combining both spatial and tab- 
a selections. These indicator variables 
record the membership of features in 
‘groups that match or don't match a set of 
Conditions. Figure 9-10 illustrates a selec- 
tion for a set of counties that both contain 


rainfall gages wear a waterbed 
watered 


So en geom af he 


part of a target watershed and hazardous 
materials (Hazmat) storage sites. A munici- 
рау might undertake ап upstream inven- 
Tory if they measure a spike in toxic 
chemicals in their water supply. Hazmat 
sites are distributed throughout the state of 
Georgia. The Altamaba River drains much 
of the central portion of the state. A spatial 
selection identifies counties that contain a 
portion of the watershed, and a table selec- 
tion identifies those counties containing. 
Hazmat sites. A column may be added to 
the county table and values assigned to 
record this set of potential source counties, 
To be used in subsequent analysis. 
Figure 9-11 shows one flowchart of the 
geoprocessing analysis in Figure 9-10. 
Ме the analysis Б relatively simple, and 
the steps primarily operations on the tabu- 
ar data, the flowchart shows the data, spa- 
tial operations, and order ina succinct 
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manner. It is good practice to flowchart all 
multi-step spatial analyses, and the value of 
the flowchart grows with the complexity of 
the analysis. 

‘Caution is helpful when applying sub- 
sequent spatial operations toa data set with 
selected features (like the rainfall gages in 
the right panel of Figure 9-9) since many 
subsequent operations are possible. Some 
‘operations by default only act on selected 
features, while other operations apply to 
the entire data set. The choices are often 
software-dependent, and vary by operation, 
‘and so you should consult the documenta- 
tion or test each new spatial operation 
‘when first using it to avoid unintended 
results. 

Containment is another spatial selec- 
tion operation, Containment selection iden- 
Чез all features that contain or surround. 
set of target features. For example, the Cal 


урон эши create 10 
ЕЕ 


Oe 


igre 3-1: An camp of tfc sl t: We wish to identity cots with 
pou crc repere ned 
ilia oa cuin comer dee wal Mme Gower A cusa Dee 


oed InSer шау be addi to denis he selected couse i further рта (born center and rg). 
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County 


with 
Beyer HazMat |! 
ne Sites 
! Create InSet item 
(бее 5— 


InSet = 1 if HazMat > O 
rar 5 \ 
River County 


InSet = 0 otherwise 


Water- HazMat 
shed Sites with 
Indicator 

voriable 


Figure 9.1: An example flowchart forthe Atanas HazMat че analysis, While ple, tis shows how 
spatia and tabular operations might be represented grapincally 


Figure 9-12: An example of a selection based оп containment. АШ tates containing a 
portion of the Missnpp River ors плиме, ме selected 


ibas онази am ay may be used to identify these jurisdictions. 
‘wish to identify all counties, cities, or other e s = 

A r- igure 9-12 illustrates a containment 
govemmentalbodies that contain some sejection based on the Mississippi River in 
portion of Highway 99, because they wish — North America. We wish to identify states 
to improve road safety. A spatial selection thar contain some portion of the river and 


its tributaries. A query is placed, identify- 
ing the features that are contained, here the 
Mississippi River network, and the target 

features that may potentially be selected. 

‘The target set in this example consists of 

the lower 48 states of the United States. All 
states that contain a portion of the Missis- 
sippi River or its tributaries are shaded as 
part of the selected set. 


Classification 


Classification is a spatial data opera- 
tion that is often used in conjunction with 
selection. A classification, also known as a 
reclassification t recoding, will categorize. 
geographic objects based on a set vod 
tions. For example, all the polygons 
re ged ett 
value equal to Large, all polygons from 0.1 
to 1 square mile may be assigned а size 
equal to Mid, and all polygons smaller than 
0.1 square miles may be assigned a size 
equal to Small (Figure 9-13). Classifica- 
tions may add to or modify the attribute. 
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data for each geographic object. These. 
modified attributes may in turn be used in 
further analyses, such as for more complex 
‘combinations in additional classification. 


Classification may be used for many 
‘other purposes. One common end is to. 
group objects for display or map produc- 
tion. These objects have a common prop- 
erty, and the goal is to display them with a 
uniform color or symbol so the similar. 
objects are identified as a group. The dis- 
play color and/or pattern is typically 
assigned based upon the values of an attri- 
bute or attributes. A range of display 
shades may be chosen, and corresponding 
values fora specific attribute assigned, The 
map is then displayed based on this classi- 

ion. 


A classification may be viewed as an 
assignment of features rom an existing set 
of classes to а new set of classes, We iden- 
Шу биши at bave a given sot of values, 
, parcels that are above a cer- 
Min size у ии em ala cs 
tion value, in this case the elass "large." 
Parcels in another range of sizes may be 
assigned different class values, for exam- 
ple, "mid" and “small.” The attribute that 
Stores the parcel area is used as a guide to 
assigning the new class value for size. 
The assignment from input attribute 
values (area) to new class values (here, 
size) may be defined manually, or the 
assignment may be defined automatically 
For manual classifications, the class transi- 
tions are specified entirely by the human. 
analyst. 
Classifications are often specified by a 
table or array, The table identifies the input 
‘lass or values, as well as the output class 
for each of this set of input values. Figure 
9-14 illustrates the use ofa classification 
table to specify class assignment. Input val- 
ues of Aor B lead to an output class value 
of 1, an input value of E leads to an output 
value of 2, and an input value of leads to 
‘an output value of 3. The table provides a 
‘complete specification for each classifica- 
tion assignment. 
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Figure 9-14 illustrates a classification 
based оп а manually defined table. A 
human analyst specifies the In items forthe 
source data layer viaa classification able, 
as well as the corresponding output value 
for each In variable. Out values must be 
specified for each input value or there will 
be undefined features in the output layer. 
Manually defining the classification table 
provides the greatest control over class 
assignment. Alternatively, classification. 
tables may be automaticaly assigned. in 
that à number of classes may be specified 
and some rule embodied in a computer. 
algorithm used to assign output classes for 
each of the input clases. 

A binary classification is perhaps the 
simplest form of classification. A binary 
classification places objects into two 
classes: O and 1, true and false, A and 8, or 
some other two-level classification. A set 
of features is selected and assigned a value, 
and the complement of the set, all remain- 
ing features in the data layer, is assigned 


A binary classification is often used to 
store the results of a complex selection. 
operation. A sequence of Boolean and set 
algebra expressions may be used to select а 
set of features. A specific target attribute is 
identified for the selected set of features, 
This target attribute is assigned a unique 
value, The target attribute is assigned a dif- 
ferent value for all unselected features. 
‘This creates a column that identifies the 
selected set; for example, all counties that 
are small, but with a large population. 


For example, we may wish to select 
states atleast partially west of the Missis- 
sippi River as ап intermediate step in an 
analysis (Figure 9-15). We may be using 
ба classification in many subsequent spa- 
tial operations. Thus, we wish to store this 
characteristic, whether the state is west or 
‘east of the Mississippi River. States are 

selected based on location and reclassified. 
‘We record this classification by creating а 
new attribute and assigning a binary value 
to this attribute, 1 for those parcels that sat- 


the different binary value, isfy the criteria, and 0 for those that do not 
Source layer 

Classification 
table 
Tn] out 

[NE Output layer 

E DE 
Ele 
т[з 


Figure 9-14: The classification of thematic layer: 
salon table, which а sed in marn, to эра classes in 


Valves are given to speci attributes in a classifi- 


aa араа layer 


(Figure 9-15). The variable i west records 
the state location relative to the Mississippi 
River, Additional selection operations may 
һе applied, and the created binary variable 
preserves the information generated in the 
initial selection. 

Previous examples have shown vector 
data, but we may also reclassify raster data. 
Ifthe input raster values are nominal or 
ordinal data, the reclassification will look 
very similar to the vector examples shown 
in Figure 9-14. A list of input and corre- 

‘outputs are provided, and the 
reclassification operates on a cell-by-cell 
basis. When interval/atio raster data are 
used as input, then input ranges are 
required rather than specific input values, 
This distinctionis described in detail in 
Chapter 10. 


Ж 


Classification table 
state nome 
[5 


Arizona 1 
Arkansas | 1 


alorado 


Connecticut | 0 


[wyoming |1 


Figure 9.15 An example of a binary clasiicaton Features ме) 
caton. west (1) and em (O) of the Mississippi River The 


Chapter 9: Basic Spatial Analyses. 377 


Data-defined Classification 


‘Manually defining the classification 
table may not always be necessary, and 
may be tedious or complex. Suppose we 
‘wish to assign a set of display colors to a 
set of elevation values, There may be thou- 
sands of distinct elevation values in the 
data layer, and it would be inconvenient at 
best to assign each color manually. Data- 
defined classification methods, where class 


intervals are tically derived from 
rules applied to the input data, are often 
‘used in these instances. 


An automatic classification uses some 
rule to specify the input class to output. 
‘lass assignments, The input and output 
‘lass boundaries are often based on a set of 
parameters used to guide class definition, 


States west of the main 
‘branch of the Mississippi 
River assigned 1, east of 
the River assigned 0 


эю wo les аэ binary eni 
table codifies the „ы: 
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A potential drawback Потап auto- 
‘mated class assignment stems fiom our 
inability to precisely specify class boundar- 
ies. A mathematical formula or algorithm 
defines the class boundaries, and so зре- 
cific classes of interest may be split. Thus, 
the analyst sacrifices precise coatrol over 
class specification when an autcmated clas- 
sification is used. 

Figure 9-16 describes a data layer we 
‘will use to illustrate automatic class assign- 
ment. The figure shows a set of “neighbor 
hoods” with that range from O 
10 5133. We wish to display the neighbor- 
‘hoods and populations in three distinct 
classes, high. medium, and low population. 
High will be shown in black, medium i 
iy sad iow in whine. We төн decide 

o assign the categories — what popu- 
lation levels define high, medium. and 
low? In many applications, the classifica- 
tion levels are previously defined. There 
‘may be an agreed-upon standard for high 


Neighborhoods 
1074 polygons 
‘population for neighborhoods 


population, and we would simply use this 
level. However, in many instances the 
classes are not defined, and we must 
choose them. 


Figure 9-16 includes a bar graph 
depicting the population frequency distri- 
bution; this type of bar graph is commonly 
LM ыен A 
gram shows the number of. 
that are found in ach bar (or "bin") of a set 
of very narrow population categories. For 
example, we may count the number of 
that have a population 
between 3,000 te 3,100. Approximately 
8.1% of the neighborhoods have a popula- 
EXE CA каше 
sponding to 8.1 units high is, We 
‘count and plot the histogram values for 
each of our narrew categories (eg. the 
number from 0 to 100, from 100 to 200. 
from 200 to 300), until the highest popula- 
tion value is ploted. 


ranges from 0 to 5.133 (3 outiers > 3.300) 


A bar graph shows the frequency of 
neighborhood population, eg.. 81% of the 
neighborhoods have a population between 
3,000 and 3400 


} ip deque 

‘The Mor etre range fom МЗ ihe 
{etre shone te fancy башын. Мае ar бее am ch eme 2300 sd 
$000 о show the the оаа neighborhood with populations above 3.30. 


Our primary decision in class assign- 
ment is where fo place the class boundar- 
ies. Should we place the boandary between 
the low and medium population classes at 
1,000, or at 1.2007 Where should the 
boundary between medium and high popu- 
lation classes be placed? The location of 
the class boundaries will change the 
appearance of the map, and also the result- 
ing classification. 

Опе common method for automatic 
classification specifies the number of out- 
put classes and requests equal-interval, 
classes over the range of input values. This 
equal-interval classification simply sub- 
tracts the lowest value of tle classification 
variable from the highest value, and 
defines equal-widih. ries to fit the 
desired number of classes into the range. 


бнуу; 


frequency (%) 


A 
is 


Equal; 
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Figure 9-17 illustrates an equal-inter- 
val classification for the population vari- 
able, Three classes over the range. 
of 0 to 5.133 are specified. Each interval is 
approximately one-third of this range. This. 
range is evenly divided by 1,711. The small 
class extends from 0 to 1,711, the medium 
class from 1.712 to 3.422, and the large. 
class from 3423 to 5.133. Population cate- 
gories are shown colored accordingly on 
the map and the bar graph, with the small 
(white), mecium (gray), and large (black) 
classes shown. 

Note that the low population class. 
shown in white dominates the map: most of 
the neighborhoods fallin this population 
class, This often happens when there are 
features that have values much higher than 
the norm. There are а few neighborhoods 
with populations above 5,000 (to the right 
of the break in the population axis of the 
bar graph). while most neighborhoods have 


interval clossification 


9.17: An equaliter classifientioo, The range 0 to 5.133 рій into three equal 
Colors ave own бе тар efie ner f) ed inde ry | Note 
"ie relatively few polygons high clases in lack A fow өс ropa 
ош near S000 t (he clam амаараа 
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populations below 3,000. The outliers shift 
the class boundaries to higher values, 1711 
and 3,422, resulting in most neighborhoods 
falling inthe small population category. 
Another common method for class 
assignment results in an equal-area classi 
fication Figure 9-18). Class boundaries are 
defined to place an equal proportion of the 
study area into each of a specified number. 
of classes. This usually leads to a visually 
‘balanced map because all classes have 
approximately equal extents. Equal-area 
classes are often desirable, for example, 
‘when resources need to be distributed over 
equal areas, or when equally sized overlap- 
ping sales territories may be specified. 
Class width may change considerably. 
with an equal-area classification compared 
to equal-interval. An equal-aea classifica- 
tion sets class boundaries so that each class 
‘covers approximately the same area. A 
class may consist of a few or even one 


frequency (%) 


large polygon. This results in a small range 
for the large polygon classes. Classes also 
tend to have a narrow range of values near 
the peaks inthe histogram. Many polygons 
are represented atthe histogram peaks, and 
зо these may correspond to large areas. 
Both of these effects are illustrated in Fig- 
ше 9-18. The middle class of the equal- 
area classification occurs at population v 
ues between 903 and 1223. This range of 
populations is near the peak in the fre- 
quency histogram, and these population. 
levels are associated with larger polygons. 
This middle class spans a range of approxi- 
‘mately 300 population units, while the 
small and large classes span near 900 and 
4,000 population units, respectively 
Equal-area assignments may be highly 
skewed when there are a few polygons with 
large areas, and these polygons have simi- 
lar values. Although not occurring in our 
example, there may be a relationship. 


Equal-area classification 


| 
‘ul الا‎ 1А 


population. 


between the population and area fora few 
neighborhoods. Suppose in a data set simi- 
lar to ours there is one very large neighbor- 
hood dominated by large parks. This 
neighborhood has both the lowest popula- 
tions and largest area. An equal-area classi- 
fication may place this neighborhood in its 
own class. If a large parcel also occurs with 
high population levels, we may get three 
classes: one parcel in the small class, one 
parcel in the high population cass, and all 
the remaining parcels in the medium popu- 
lation class. While most equal-area classifi- 
cations are not this extreme, unique parcels 
may strongly affect class ranges in an 
equal-area classification. 

We will cover a final method for auto- 
mated classification, a method based on 
natural breaks, or gaps, in the data (Figure 
9-19). Natural breaks classification looks. 
Tor "obvious" breaks. It attempts to identify 


frequency (X) 
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naturally occurring clusters of data; not 
clusters based on the spatial relationships, 
but rather clusters based оп an ordering 
variable. 

‘There are various methods used 10 
identify natural breaks. Large gaps in an 
‘ordered list of values are one common 
method. The values are listed from lowest 
to highest, and the largest gaps in values 
selected. Barring using gaps, low points in 
the frequency histogram may be used to 

‘breaks. There is usually an effort to 
balance the need for relatively wide and 
evenly distributed classes and the search 
for natural gaps, Many narrow classes and 
опе large class may not be acceptable in 
many instances, and there may be cases 
where the specified number of gaps does 
not occur in the data histogram. More. 
classes may be requested than obvious. 


Natural breaks classification 
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gaps. so some natural break methods 
include an alternative method, br example, 
equally spaced intervals, for postions of the 
histogram where no natural gaps occur. 


Figure 9-19 illustrates a natural break 
classification. Two breaks are evident in 
the histogram, one near 1300 and one near 
2200 persons per neighborhooc. Small, 
‘medium, and large populations are 
assigned at these junctures. 


Figure 9-17 through Figure 9-19 illus- 
trate an important point: you must be care- 
ful when interpreting class maps, because 
the apparent relative importance of catego- 
ries may be manipulated by altering the 
starting and ending values in each class. 
Figure 9-17 suggests most neighborhoods. 
are low population, Figure 9-18 suggests 
that high population neighborhoods cover 
the largest areas and that they are well 
mixed with areas of lower population, 
While Figure 9-19 indicates the area is 
dominated by low and medium population 
neighborhoods. Precisely becatse there are. 
no objectively defined population boundar- 
ies, we have great flexibility in manipulat- 
ing the impression we create. The legend in 
class maps should be scrutinized, and the 
range between class boundaries noted. 


‘The Modifiable Areal Unit Prob- 
lem 


A fundamental challenge in spatial anal- 
ysis is that polygons may be reclassified and 
grouped in many ways when there are no 
objectively recognizable categories. Con- 
sider census data. These data are collected at 
the level of individuals and households but 
are reported atthe aggregate level of census 
blocks, block groups, tracts, or counties. An 
analyst using fine-scaled block data may 
elect to group blocks together in order to 
more readily map or analyze the data. The 
‘most common reason for grouping basic 
units like blocks is wanting to match фено 
existing analytical frames such as school dis- 
tricts, neighborhoods, or watersheds. Spa- 
tially grouping blocks entails agaregating 
their values for population, age, nd income, 


or other census measures. These subsequent 
Values will depend on the siz, shape, and 
location of the aggregated polygons, as seen 
for mean population age in Figure 9-20. 


This general phenomenon whereby atti- 
bute values vary by spatial grouping, known 
as the modifiable areal unit problem or 
МАЙР, has been exploited by politicians to 
redraw political boundaries to one party or 
another's — but generally not the country's. 
— advantage. The process of aggregating 
neighborhoods to create majority blocks for 
political advantage was named gerrymander- 
ing after Massachusetts governor Elbridge 
Gerry, when he crafted а political district 
shaped like a salamander. 

‘There are two primary characteristics of 
the MAUP that may be manipulated to affect. 
aggregate polygen values. The first is the 
zoning effect, tha aggregate statistics may 
change by the shape of the units, and the sec- 
‘ond is the size effct, that aggregate statistics 
may change with the area of the units. For 
example, the mean income of a unit will 
change when the boundaries of unit. 
change, either because of a change in zone 
ora change ш size. 

МАЙР effects may substantially influ- 
ence the values for each unit, and hence sub- 
sequent analysis. Openshaw and Taylor 
published result in 1979 that illustrate 
МАЙР dependencies particularly well. They 
analyzed the percentage of elderly voters 
and the number of Republican voters for the. 
39 counties in Towa. They showed that the 
correlation between the elderly voters and 
Republican voter numbers ranged from 0.98 
10-051 by varying the scale and aggregation 
units that grouped counties. Additional work 
has shown that multivariate statistical mod- 
els based on aggregate data are similarly 
dependent on the aggregation units, leading 
to contradictory results predictions. 

Numerous stdies of the MAUP have 
shown how to identify and/or reduce the pri- 
‘mary negative impacts of the zoning and size 
effects. The primary recommendation is to 
work with the most basic units of measure. 


In our census example, this would be to col- 


lect and maintain information on the individ- 
ELTE LAE. 
г aggregation is: 
cally required to maintain the anonymity of. 
the census respondents. However. many 
efforts allow recording and maintaining data 
оп the primary units within a GIS frame- 
‘work. and this should be i when 
possible. With census dita or other confi- 
dential information. the next best thing is to 
ee eer availabe when pos 
А second way to address the MAUP is 
‘based on optimal zoning, Zones are designed 
to maximize variation between zones while 
‘minimizing variation within zones. Optimal 
‘Zones are difficult to define for more than 
‘one variable, because variables often do not 
change in concert For example, an optimal 
set of zones for determining trafic densities 
may not be an optimal for average аре. Old 
people are no more nor less likely to live 
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‘near busy or low traffic roads. Optimum. 
zoning approaches are best applied when. 

interest in one variable predominates. 

Another approach to solving the MAUP 
involves conducting a set of sensitivity anal 
yses. Units are aggregated and rezoned 
across a range of sizes and shapes and the 
analyses performed for each set. Changes in 
the results are observed. and the sensitivity. 
o zone boundaries and sizes noted. These 
tests may identify the relative sensitivities of 
different variables to size and zoning effects. 
Robust results may be identified, for exam- 
ple, average age may not change over a 
range of sizes and unit combinations, yet 
‘may change substantially over a narrow 
range of sizes in some areas. This approach 
requires many computations, because й uses 
replicated runs for each set of variables. 
zone levels, and shapes. This often over- 
‘whelms the available computing resources. 
for many problems and agencies. 
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Dissolve 

A dissolve function is primarily used to 
combine similar features within a data _ 
layer. Adjacent polygons may have identi- 
cal values for an attribute, For example, a 
‘wetlands data layer may specify polygons 
with several subclasses, such as wooded 
wetlands, herbaceous wetlands, or open. 
water. If an analysis requires we identify 
only the wetland areas vs. the upland areas, 
then we may wish to dissolve all boundar- 
ies between adjacent wetlands. We are only 
interested in preserving the wetland upland 
boundaries. 

Dissolve operations are usually applied 
based on а specific "dissolve" attribute 


associated with each feature. A value or set 
of values is identified that belongs in the 
same grouping. Each line that serves as a 
‘boundary between two polygon features is 


Arizona 
Arkansos | 


assessed. The values for the dissolve atri- 
bute are compared across the boundary 
line. If the values are the same, the bound- 
ary line is removed, or dissolved away. If 
the values for the dissolve attribute differ 
across the boundary, the boundary line is 
left intact. 

Figure 9-21 illustrates the dissolve 
‘operation that produces a binary classifica- 
tion. This classification places each state of 
the contiguous United States into one of 
two categories, those entirely west of the 
Mississippi River (1) and those east of the 
Mississippi River (0). The attribute named 
‘west contains values indicating location. 
A dissolve operation applied on the vari- 
able s west removes all state boundaries 
between similar states. This reduces the set 
from 48 polygons to two polygons. 


Dissolve operation 


Boscdancs are removed when they separate states 


Dissolve operations are often needed 
prior to applying an area-based selection in 
spatial analysis. For example, we may wish 
to select areas from the natural breaks clas- 
sification shown on the left of Figure 922. 
We seek polygons that are greater than 3 

mi? in area and have a medium population. 
The polygons may be composed of multi- 
ple neighborhoods. We typically must dis- 


approximately 2 mi^. Their total area is 4 
mi”, above the specified threshold yet they 
Will not be selected unless a dissolve is 
applied first. 


Before dissolve. 


Figure 9.22: да, fa dinoe 
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Dissolves are also helpful in removing. 
‘unneeded information. After ће classifica- 
tion into small, medium. and large size 
classes, many boundaries may become 
redundant. Unneeded boundaries шау 


Figure 9-22 illustrates the space saving 
and complexity reduction common when 
applying a dissolve function, The number 
‘of polygons is reduced nine 
fold by the dissolve, from 1.074 on the left 
to 119 polygons on the right of Figure 9-22. 


After dissolve 


Note the removal of lines separating poly- 


Т» ently reduces the number of polygon 
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Attribute Aggregation in a Dis- 
solve Operation 


We often wish to transfer information 
in the attribute table when applying a dis- 
solve function. For example, we may wish 
to find the total population of a set of 

ighborhoods that are within walking dis- 
tance of a set of bus stops. Each neighbor- 
hood polygon contains a population count 
attribute, and we wish to sum the values 
‘across the neighborhoods that correspond 
To each bus stop. The new dissolved poly- 
gons, representing the clos- 
est to each bus stop. will contain a summed 
population variable. We might then do fur- 
ther analysis to identify areas where new 
‘bus stops might be needed. or to recom- 
mend a change in bus frequency. 


Aggregation functions allow us to pre- 
serve information in a dissolve operation. 
"Typically we may sum, average, calculate 
the range, maximum, minimum, or other 
‘common statistics, and assign these to the 
‘output polygons. The functions first iden- 
tify the adjacent polygons that will be com- 
bined to form the new polygons, and then 
apply the specified operation to target atri 
bute variables. Figure 9-23 diagrams a dis- 
solve that sums cost across input polygons. 
Adjacent polygons of the same type аге 
‘combined, and the values ofthe component 
polygons summed. Different analyses 
‘might require different aggregation statis- 

erage, maximum. or range 


‘Some of the variables might be non- 
sensical as inputs ог as outputs, so care 

must be taken when ing during а 
dissolve. This is particularly roe for area. 
averaged values, or for categorical or ordi- 
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nal values as inputs. If an analyst averages. Aggregates of categorical or ordinal 
the average population of two input poly- variables also require caution, as the aver- 
gons. the result is an erroneous average for age or sum of ordinal or category values 
the output polygon, except in the rare case Ваз meaning in relatively few applications. 
‘when both polygons have the same area. Aggregating land cover, for example, 
Proof of this result is left to the reader. ‘would make no sense if it averaged the 
numerical values for forest and agriculture. 
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Proximity Functions and Buffering 


Proximity functions ot operations are 
among the most powerful and common 
spatial analysis tools. Many important 
questions hinge on proximity, the distance 
between features of interest. How close are 
schools to an oil refinery, what neighbor- 
hoods are far from convenience stores, and. 
which homes will be affected by an 
increase in freeway noise? Many questions 
regarding proximity are answered through 
spatial analyses in a GIS. Here we focus on 
proximity functions that create new fea- 
tures and layers, rather than proximity 
selection, described earlier. 

Proximity functions modify existing 
features or create new features that depend 
in some way on distance. For example, one 
simple proximity function creates a raster 
of the minimum distance from a set of fea- 
tures (Figure 9-25), The figure shows a dis- 
tance function applied to water holes in a 
wildlife reserve, Water is a crucial resource 
for nearly all animals, and the reserve man- 
agers may wish to ensure that most ofthe 
area is within a short distance of water. In 
this instance point features are entered that 
represent the location of permanent water. 


‘Water holes are represented by individual 
points, and rivers by a group of points set 
along the river course. A proximity func- 
tion calculates the distance to all water 
points for each raster cell. The minimum 
distance is selected and placed in an output 
raster data layer (Figure 9-25). The dis- 
tance function creates a mosaic of what 
appear to be overlapping circles. Although 
the shading scheme shows apparently 
abrupt transitions, the raster cells contain a 
smooth gradient in distance away from 
each water feature. 

Distance values are most often calcu- 
lated based on the Pythagorean formula. 
(Figure 9-26). These values are typically 
calculated from cell center to cell center 
‘when applied to a raster data set. Although 
any distance is possible, the distances. 
between adjacent cells change in discrete 
intervals related to the cell size. Note that 
distances are not restricted to even multi- 
ples of the сей size, because distances mea- 
sured on diagonal angles are not even. 
multiples of the cell dimension. There may 


3 a water point 
11 
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be no cells that are exactly some fixed dis- 
distance = VG - y tance away from the target features; how- 

‘ever, there may be many cells less than or 
greater than that fixed distance. 


Buffers 


Buffering is one of the most com- 
monly used proximity functions. A buffer 
region that is less than or equal to a 
specified distance from one or more fea- 
tures (Figure 9-27). Buffers may be deter- 
mined for point, line, or area features, and 
for raster or vector data. Buffering is the 
process of creating buffers. Buffers typi- 
cally identify areas that are "outside" some 
given threshold distance compared to those 
"inside" some threshold distance. 

Buffers are used often because many 
spatial analyses are concerned with dis- 
tance constraint: For example emergency 
distance from nearest qo" planners may wish to know which schools 
Torge! cell are within 1.5 kilometers of an. 
fault, a park planner may wish to identi 
Figure 9 26 A distance function applied to a all lands more than 10 kilometers from the 
Fater data set, nearest highway, or a business owner may 

wish to identify all potential customers 

within a given radius of her store. All these 
questions may be answered with the appro- 
priate use of buffering. 


Vector buffer Raster buffer 
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Figure 9.27: Examples of vector and raster buffers derived fom polygonal features. A buffer i defined 
by those areas ht sre within some buler tan fon the ipi fates 
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Raster Buffers 


Buffer operations on raster data entail 
calculating the distance from each source 
cell center to all other сей centers. Output 
cells are assigned an in value whenever the 
cell-to-cell distance is less than the speci- 
fied buffer distance. Those cells that are 
further than the buffer distance are 
assigned an out value (Figure 9-28) 

Raster buffers combine a minimum 
distance function and a binary classifica- 
tion function. A minimum distance func- 
tion calculates the shortest distance from a 
set of target features and stores this dis- 
tance in a raster data layer. The binary clas- 

ication function splits the raster cells into 
two classes: those with а distance greater 


distance less than or equal toa threshold 
value, 


Buffer distance » 15 units 


ропе trom nearest 


Buffering with raster data may pro- 
duce a “stair-step” boundary, because the 
distance from features is measured between 
cell centers. When the buffer distance runs. 
parallel and near a set of cell boundaries, 
the buffer boundary may "jump" from one. 
Tow of cells to the next (Figure 9-28). This 
phenomenon is most often a problem when 
the raster cell size is large relative to the 
buffer distance. A buffer distance of 100m 
may be approximated when applied to a 
raster with a cell size of 30 m. A smaller 
сей size relative to the buffer distance 
results in ess obvious “stairstepping.” The 
сей size should be small relative to the spa- 
tial accuracy of the data, and small relative 
то the buffer distance. If this ule is fol- 
lowed, then stair-stepping should not be a 
problem, because buffer sizes should be 
‘many times greater than the uncertainty 
inherent in the data. 


Vector Buffers 


Vector buffering may be applied to 
point, line, or area features, but regardless 
of input, buffering always produces an out- 
ри set of area features (Figure 9-29), 
are many variations in vector buffering. 
Simple buffering, also known as fixed dis- 
tance buffering. is the most common form 
of vector buffering (Figure 9-29), Simple 

fering identifies areas that are a fixed 
distance or greater from a set of input fea- 
tures. Simple buffering does not distinguish 
between regions that аге close to one fea- 
ure from those that are close to more than 
‘one feature. A location is either within a 
given distance from any one of a set of fea- 
tures, or farther away. 


Simple buffering uses a uniform buffer 
distance for all features. A buffer distance 
‘of 100 meters specified for a roads layer 
may be applied to every road in the layer 
irrespective of road size, shape, or location. 
Ina similar manner, buffer distances for all 
points in a point layer will be uniform, and 
buffer distances for all area features will be 
fixed. 


Chapter 9: Basic Spatial Analyses 391 


Vector buffers 
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Figure 9.29: Vector buffers produced fim pout he, or polyon put fesses o0 
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Buffering on vector point data із based 
оп the creation of circles around each point 
in a data set. The equation for a circle with 
an origin at x=0, ye0 is: 


r= dere on 


where r is the buffer distance. The more 
general equation for a circle with a center 
БЕ. 


r= foo «(Y= өз 


Equation (9.2) reduces to equation (9.1) at 
the origin, where xt = 0, and yi = 0. The 
general equation creates a circle centered 
оп the coordinates xi, yi, with a buffer dis- 
tance equal to the radius, r. Point buffers 
are created by applying this circle equation 
successively to each point feature in а data 


layer. The x and y coordinate locations of 
each point feature аге used for x and Y1, 
placing the point feature at the center of a 
circle. 


Buffered circles may overlap. and 
simple buffering. the circle boundaries that 
occur in overlap areas are removed. For 
example, areas within 10 km of hazardous 
waste sites may be identified by creating a 
buffer layer. We may have a data layer in 
which hazardous waste sites are repre- 
sented as points (Figure 9-300). A circle 
with а 10 km radius is drawn around each 
point. When two or more circles overlap, 
internal boundaries are dissolved, resulting 
in noncircular polygons (Figure 9-300). 

More complex buffering methods may 
be: ‘These methods may identify 
buffer areas by the number of features 
within the given buffer distance, or apply 
‘variable buffer distances depending on the 
characteristics of the input features. We 
may be interested in areas that аге near 
multiple hazardous waste sites. These areas 
near multiple hazardous sites may entail 
‘added risk and therefore require special 

itoring or treatment. We may be man- 
dated to identify all areas within a buffer 
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a) point layer 


c) compound buffer, 
overlap identified 
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distance of a hazardous waste site, and the 
number of sites. In most applications, most 
of the dangerous areas will be close to one. 
hazardous waste site, but some will be 
close to two, three, or more sites. The sim- 
ple buffer, described above, will not pro- 
vide the required information. 

e mere pay 
сот fering. provides the needed 
information. Compound buffers maintain 
all overlapping boundaries (Figure 9-300). 
All circles defined by the fixed radius buf- 
fer distance are generated. These circles are 
then intersected to form a planar graph. For 
each area, an attribute із created that 
records the number of features within the 
specified buffer distance. 


Nested (or multiring) buffering is 
scott canon bofin varan gre 
9-304). We may require buffers at multiple 
distances, In our hazardous waste site 
‘example, suppose threshold levels have 
‘been established with various actions. 
required for each threshold. Areas very 
close to hazardous waste sites require evac- 


b) simple buffer. 
‘overlap dissolved 


ә 


© 


Наке депа ш woud eich iem 


uation, intermediate distance require reme- 
diation. and areas farther away require 
‘monitoring. These zones may be defined 
by nested buffers. 
Buffering on vector line and polygon 
data is also quite common. The formation 
of line buffers may be envisioned as a 
of steps. First, circles are created 
"hat are centered at each node or vertex 
(Figure 9-31). Tangent lines are then gener- 
ated. These lines are parallel to the input 
feature lines and tangent to the circles that 
are centered at each node or vertex. The 
tangent lines and circles are joined and 
interior circle segments dissolved, 
Variable distance buffers (Figure 9- 
32) are another common variant of vector 
buffering. As indicated by the name, the 
buffer distance is variable, and may change 
among features. The buffer distance may 
increase in steps; for example, we may 
have one buffer distance for a given set of 
features, and a different buffer distance for 
the remaining features. In contrast the buf- 


Figure 9-31: The creation ofa line buffer st a 
fixed distance r 


fer distance may vary smoothly: for exam- 
pie e buffer distance around a city may 

а function ofthe population density in 
the city. 

There are many instances for which we 
may require a variable distance buffer. We 
may wish to specify a larger buffer zone 
around large fuel storage facilities when 
compared to smaller fuel storage facilities. 
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We often require more stringent protec- 
tions further away from large rivers than 
for small rivers, and give large landfills a 
wider berth than small landfills. 

Figure 9-32 illustrates the creation of 
buffers around a river network, where dis- 
tance varies by river size. The increase in 
distance may be motivated by an increased 
likelihood of flooding downstream or an 
increased sensitivity to pollution. We may 
specify a buffer distance of 50 km for small 
rivers. 75 km for intermediate size rivers, 
and 100 km for large rivers. There are 
many other instances when variable dis- 
tance buffers are required, for example, for 
road noise, smoke stacks, or landfills. 

‘The variable buffer distance is often. 
specified by an attribute in the input data 
layer (Figure 9-32). A portion of the attri- 
bute table for the river data layer is shown. 
The attribute table contains the river name 
in mver. sentier and the buffer distance is 
stored in buts The attribute buts is 
accessed during buffer creation, and the 
size ofthe buffer adjusted automatically for 
‘each line segment. Note how the buffer size 
depends on the value in burst. 


veridentifier Бина 
mississippi 100 
mssoun 50 
arkansas 50 
‘ohio 15 
Tennessee 75 
St cro 75 
D 75 
wisconsin 75 


Figure 9-32: An illustration of a variable distance buffer A line buffer is shown with a variable buffer dit- 
tae based опа re erie А variable butler distance. 2732. i specific ш a table and applied for 
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Overlay 


Overlay operations are powerful spa- 
tial analysis tools, and were an. 

driving force behind the development of 
GIS technologies, Overlays involve com- 
bining spatial and attribute data from two 
‘or more spatial data layers, and they are 
‘among the most common and powerful 
spatial data operations (Figure 9-33), Many 
problems require the overlay of themati- 
cally different data, For example, we may 
‘wish to know where there are inexpensive 
houses in good school districts. where 
‘whale feeding grounds overlap with pro- 
posed oil drilling areas, or the location of 
farm fields that are on highly erodible soils. 
Inthe later example, a soils data layer may 
be used to identify highly erodible soils, 
and a current land use layer may be used to 
identify the locations of farm fields. The 
‘boundaries of erodible soils will not coin- 
cide with the boundaries of the farm fields. 
in most instances, so these soils and land 
‘use data must somehow be combined. 
Overlay is the primary means of providing. 
this combination. 


‘An overlay operation requires that data. 
layers use a common coordinate system. 
Overlay uses the coordinates that define 
‘each spatial feature to combine the data 
from the input data layers. The coordinates 
for any point on the Earth depend on the 
Coordinate system used (Chapter 3). If the. 
Coordinate systems used in the various lay- 
ers are not exactly the same, the features in 
the data layers will not align correctly. 

‘Overlay may be viewed as the vertical 
stacking and merger of spatial data (Figure 
9-33). Features in each data layer are set 
‘one "on top” another, and the points, lines, 
ог area feature boundaries are merged into 
a single data layer. The attribute data are 
also combined so that the new data layer. 
includes the information contained in each 
input data layer. 


Vector Overlay 


Overlay when using a vector data 
model involves combining the point, line, 
and polygon geometry and associated attri- 
bute data. This overlay creates new geome- 
try. Overlay involves the merger of both the 
coordinate and attribute data from two vec- 
tor layers into a new data layer. The coordi- 
nate merger may require the intersection 
and splitting of lines or areas and the cre- 
ation of new features. 

Figure 9-34 illustrates the overlay of 
two vector polygon data layers. This over- 
lay requires the intersection of polygon. 
boundaries to create new polygons. The 
overlay combines atribute data during 
polygon overlay. The data layer on the left 
is composed of two polygons. There are 
only two attributes for Loyer 1, one an iden- 
tifier (10), and the other specifying values. 
fora variable named class. The second 
input data layer, Loyer 2, also contains two 
polygons, and two attributes, 10 and cost. 
Note that the two tables have an attribute. 
with the same name, Io. These two ID attri 
butes serve the same function in their 
respective data layers, but they are not 


Loyer 1 Layer 2 


geographic data 


T 2 


geographic data 


Chapter 9: Basic Spatial Analyses. 395 


related. А value of 1 for the 10 attribute in 
Layer 1 has nothing to do with the 10 value 
‘of tim Loyer 2. It simply identifies a unique 
‘combination of attributes in the output 
layer. 

‘Vector overlay of these two polygon 
data layers results in four new polygons. 
Each new polygon contains the attribute 
information from the corresponding area in 
the input data layers. For example, note 
that the polygon in the output data layer 
with the то of 1 has a cios attribute with a 
value of 0 and a cost attribute with a value 
‘of 10. These values come from the values 
found in the corresponding input layers. 
The boundary for the polygon with an 10 
value of 1 in the output data layer is a com- 
posite of the boundaries found in the two. 
input data layers. The same holds true for 
the other three polygons in the output data 
layer. These polygons are a composite of 
geographic and attribute data in the input 
data layers. 

‘The topology of vector overlay output 
will likely be different from that of the 
input data layers. Vector overlay functions 
typically identify line intersections during 
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overlay. Intersecting lines are split and a 
node placed at the intersection point. Thus 
topology must be recreated if it is needed in 
further processing. 

Any type of vector feature may be 
overlaid with any other type of vector fea- 
ture, although some overlay operations 
provide useful information and are per- 
formed infrequently. In theory. points may 
be overlaid on point, line, or polygon fea- 
‘ture layers, lines on all three types, and 
polygons on all three types. Point-on-point 
ог point-on-line overlay rarely results in 
intersecting features, and so they are rarely 
applied, Line-on-line overlay is sometimes 
required, for example, when we wish to 


Input polygon 


layer layer 


Greene) 


Input point 


identify the intersections of two networks 
such as road and railroads, but these also 
are rare occurrences. Overlays involving 
polygons аге the most common by far. 

Overlay output typically takes the low- 
est dimension of the inputs. This means 
point-in-polygon overlay results in point 
output, and line-in-polygon overlay results 
in line output. This avoids problems when 
multiple lower dimension features intersect 
with higher dimension features. 

Figure 9-35 illustrates an instance 
where multiple points in one layer fall 
within a single polygon in an overlay layer. 
Output attribute data fora feature are a 
combination of the input data attributes. If 
polygons are output (Figure 9-35, right, 
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top), there is ambiguity regarding which 
point attribute data to record. Each point 
feature has a value for an attribute named 
coss. It is not clear which value should be 
recorded in the output polygon. the cioss 
value from point А, point B, or point С. 
‘When a point layer is output (Figure 9-35, 
right, bottom), there is no ambiguity. Each 
output point feature contains the original 
point attribute information, plus the input 
polygon feature attributes. 

Опе method for creating polygon out- 
put from point-in-polygon overlay involves 
recording the attributes for one point 
selected arbitrarily from the points that fall 
within a polygon. This is usually not satis- 
factory because i information may 
be lost. An alternative involves adding col- 
umns to the output polygon to preserve 
multiple points per polygon. However, this 
‘would still result in some ambiguity, such 
a5, what should be the order of duplicate. 
attributes? It may also add a substantial 
number of sparsely used items, thus 
increasing file size inefficiently, Forcing 
the lower order output during overlay 
avoids these problems, as shown in the 
lower right of Figure 9-35. 

Note that the number of attributes in 
the output layer increases after each over- 
ay. This is illustrated in Figure 9-35, with 
the combination of a point and polygon 
layer in an overlay. The output point attri- 
bute table shown in the lower right portion 
of the figure contains four items. This out- 
put attribute table is a composite of the. 
input attribute tables. 

Large attribute tables may result if 
overlay operations are used to combine 
many data layers. When the output from an 
overlay process is in tum used as an input 
fora subsequent overlay, the number of 
attributes in the next output layer will usu- 
ally increase. As the number of attributes 
grows, tables may become unwieldy. and 
We often delete redundant attributes. 
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Overlays that include a polygon layer 
are most common. We are often interested 
їп the combination of polygon features 
with other polygons, or in finding the coin- 
‘idence of point or line features with poly- 
gons. What counties include hazardous 
‘waste sites? Which neighborhoods does 
‘one pass through on Main Street? Where 
are there shallow aquifers below com- 
fields? All these examples involve the 
overlay of area features, either with other 
area features, or with point or line features 


Clip, intersect, and Union: Spe- 
cial Cases of Overlay 


‘There are three common ways overlay 
‘operations are applied: as а clip, an imer- 
section, or a union. 

The basic layer-on-layer combination. 
is the same for all three. They differ in the 
geographic extent for which vector data are 
recorded, and in how data from the atri 
bute layers are combined. Intersection and 
union are derived from general set theory 
‘operations. The intersection operation may 
be considered in some ways to be a spatial 
AND, while the union operation is related to 
a spatial OR. The clip operation may be 
‘considered a combination of an intersection 
and an elimination. Al three are common 
and supported in some manner as stand- 
alone functions by most GIS software 
Packages, 

3А clip may be considered a "cookie- 
cuter" overlay (Figure 9-36). A bounding 
polygon layer is used to define the areas for 
which features will be output. This bound- 
ing polygon layer defines the clipping 
region. Point, line, or polygon data in a sec- 
ond layer are "clipped" with the bounding 
layer. In most versions of the clip function, 
the attributes for the clipping layer are not 
included in the output data layer. A clip is 
most often used when sub-setting data geo- 
graphically, to reduce data volumes when 
‘working on a small area included in larger 
data layers. А city manager may only wish 
the set of streets within their city boundar- 
ies, clipped from a statewide roads layer. 


Extent of the clipping layer. 
the boundaries and attributes 
of only the clipped layer 


In our example shown in Figure 9-36, 
the bounding or clipping диз layer consists 
of seven county polygons, and the target or 
clipped data layer contains many smail 
catchment boundaries. The presence of 


polygon айдышев in the bound i 
icated by the different shades for the 
different county polygons. The output from 
the clip consists of those portions of catch 
‘ments within the clip layer boundary. Note 
that the clip layer boundaries, here coun- 
ties, аге not included in the output data 
layer. Also note that only the attributes for 
the clipped catchment layer are output. 


Тыр proervesnfaetion say foes 
ing or odin ya) Ты a aE 
er фит асадзе 


Users should be certain that transferred 
variables still have valid values after a clip. 
‘fan area field is included in the input 
layer, the value may be wrong if it is not re- 
calculated. Other area-based values may 
also be in error, e.g., а polygon density ог 
‘counts of included features. Since software 
defaults vary, the behavior of the specific 
software tool should be identified. 


An intersection is another multi-layer 
combination, and may be defined as ап 
overlay that fuses data from both layers, 
but only for the area where both layers con- 
таш data (Figure 9-37). Features boundar- 
ies from both data layers are combined. 
Both layers serve as data and as bounding 
layers, so that any parts of polygons that 
are in one layer but not another are clipped 
and discarded. 

The same caution on relevancy and 
recalculation of combined, area-based vari- 
ables applies to the output from intersec- 
tion operations as toclip 


Intersect 


Catchments 
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‘operations. Since new geographies with 
differing areas are created, attributes trans- 
ferred from the component features may 
need new interpretations. Most implemen- 
tations simply copy the values from the 
input features to the coincident ouput fea- 
tures. For example, a county area for each 
polygon will come from the input county 
polygon, While this was valid for each 
input county in Figure 9-37, it will be repli- 
‘cated for each polygon in the output layer, 
sod shou be herbed ax tee oe 
contributing county, not the area from that 
‘county inthe (usually smaller) output poly- 
gon. Summing across polygons with a 
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given county value will usually not give an 
accurate county area. Each output variable 
should be scrutinized and the value origin. 
and contents clearly understood. 

A union is another kind of overlay. It 
retains all data from both the bounding and 
data layers (Figure 9-38). No geographic 
data are discarded in the union 
and ing attribute data are saved 
for all regions. New polygons are formed 
bby the combination of coordinate data from 
each data layer. 


Union Catchments 
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Note that there are offen many null or 
empty attribute values in union output lay- 
ers. Data in non-overlapping areas are 
absent and so cannot be assigned, e.g., out- 
side the county layer bounds but within the 
watershed layer bounds in Figure 9-38. The 
presence of mull values may alter subse- 
quent operations. 


Many software packages support addi- 
tional variants of overlay operations, Some 
support an Erase or similary-named func- 
tion, which is the complement to the clip 

function. In an Erase function, areas cov- 

‘ered by the input layer are "cut out” or 
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erased from the bounding layer (Figure 9- 
39), Erasures may cut existing polygons 
арап or, where there are coincident lines in 
the two data layers, may preserve the edge 
of existing polygons. In some versions, 
there is a tolerance distance that allows for 
lines that aren't exactly coincident in dif- 
ferent layers, but are meant to be, to be rep- 
resented only once in the output. This. 
tolerance distance effectively serves as a 
snapping distance, and moves the vertices 
in one layer on a nearly coincident edge to 
match vertices in the other layer. As with 
snapping during digitization or other over- 
lays, this may help reduce incorrect or 
unwanted geometries. 
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The Erase function is particularly use- 
ful when updating a portion ofa data layer, 
in that old, ош of date, poorer quality. or 
‘otherwise inferior data may be clipped out. 
ofa section and newer data substituted. 
Erase is also often useful in spatial analyses 
that include criteria specifying areas that 
are greater than some distance from a set of 
features. A buffer function identifies areas 
that are less than the target distance, and 
these may then be removed from consider- 
ation using an erase operation. 


Catchment 
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‘There are other variants on unions or 
intersections. Most of these speciali 
overlay operations may be created from the 
application of union or overlay operations 
in combination with selection operations. 
Vector overlay is often a time-consum- 
ing computational process, due to the large 
number of lines that must be compared. A 
vector overlay typically requires repeated 
tests of line intersection, a relatively simple. 
set of calculations, but there is often a large 
number of line segments in a data set. Each 
line segment must be checked against. 
every other line segment, requiring perhaps 
billions of tests for line intersection, 


common 
boundary 


А Problem in Vector Overlay 


Polygon overlays often suffer when 
there are common features that are repre- 
sented in both input data layers. We define 
а common feature as a representation of the 
‘same phenomenon in different layers. Fig- 
ure 9-40 illustrates this problem. A county 
boundary may coincide with a state bound- 
ary. However, different versions of the state 
and county boundaries may be created 

ly from two adjacent states, 
using different source materials, at differ- 
ent times, and using different systems, 
Thus, these two ations may differ 


‘even though they identify the same bound- 
ary on the earth surface. 

In most data layers, the differences will 
be quite small. and will not be visible 
except at very large display scales, for 
example, 


+. when the on-screen zoom is quite 


high. The differences аге shown in the. 
larger-scale inset in Figure 9-40. When the 
county and state data layers are overlaid, 

many small polygons are formed along the 
boundary. These polygons are quite small, 
but they are often quite numerous. 

‘These “sliver” polygons cause prob- 
ems because there is an entry in the attri- 
bute table for each polygon. One-half or 
more of the polygons in the output data 
layer may be these slivers. Slivers are a 
burden because they take up space in the 
attribute table but are not of any interest or 
use. Analyses of large data sets are hin- 
dered because all selections, sorts, or other 
operations must treat all polygons, includ- 
ing the slivers. Processing times often 
increase exponentially with the number of 
polygons, 

‘There are several methods to reduce 
the occurrence of these slivers. One identi 
fies all common boundaries across differ- 
ent layers. The boundary with the highest 
coordinate accuracy is substituted into all 
other data layers, replacing the less accu- 
Tate representations. This involves consid- 
erable editing, and most common when 
developing new data layers. 


Another method involves manually 
identifying and removing slivers. Small 
polygons may be selected. ог 
with two bounding arcs, common for sliver 
polygons. Bounding lines may then be 
Adjusted or removed. However, manual 
removal is not practical for many data sets 
due to the high number of sliver polygons. 
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A third method for sliver reduction 
involves defining a snap distance during 
overlay. Much as with a snap distance used 
during data development (described in. 
Chapter 4). this forces nodes or lines to be 
coincident if they are within a specified 
proximity during overlay. As with data. 
‘entry, this snap distance should be small 
relative to the spatial accuracy of the input. 
layers and the required accuracy of the out- 
put data layers. Ifthe two representations 
ofa line are within the snap distance then 
there will be no sliver polygons. In prac- 
tice, not all sliver polygons are removed, 
but their numbers are substantially 
reduced, thereby reducing the time spent. 
‘on manual editing. 

Automatic sliver detection and 
removal should be applied carefully, as 
they may delete valuable data, Only small 
slivers should be removed, with small 
defined as smaller than an area, length, ог 
‘width that is worth tracking for a given 
problem or analysis, This distance may be 


polygon edge locations are only di 
to within one meter of their true position, it 
makes little sense to maintain polygons that 
are less than a meter in any dimension, 
However, if slivers are removed that are 
substantially wider and longer than a meter, 
some valuable information may be lost 
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Raster Overlay 


Raster overlay involves the cell-by-cell 
combination of two or more data layers. 
Data from one layer in one cell location. 

ко а cell in another dara layer. 
The cell values are combined in some man- 
ner and an output value assigned to a corre- 
sponding cell in an output layer. 

Raster overlay is typically applied to 
nominal or ordinal data. A number or char- 
acter stored in each raster cell represents a 
nominal or ordinal category. Each cell 
value corresponds toa category fora raster. 
variable. This is illustrated in the input data 
sets shown at the left and center of Figure 
9-41. Input Loyer A represents soils data, 
Each raster cell value corresponds to a spe- 
cific soil value. In a similar manner, input 
Layer B records land use, with values 1, 2, 
and 3 corresponding to particular land uses. 
These data may be combined to create. 
areas fusing the two input layers — cells 
‘with values for both soil type and land use. 

‘There are as many poteatial output cat- 
‘egories as there are possible combinations 
of input layer values. In Figure 9-41 there 
are two зой types in Loyer А and three land 
use types in Loyer 8. There may be 3 x2 or 


6 different combinations in the output 
layer. Not all combinations will necessarily 
occur in the overlay, as shown in Figure 9- 
41. In this example only four of t 
‘overlay combinations occur. Unique identi- 
fiers must be generated for each observed 
‘combination, and placed in the appropriate 
сей of the output raster layer. 

The number of possible combinations 
is important to note because it may change 
the number of binary digits or bytes 
required to represent the output raster data. 
layer. A raster cell typically contains a 
number or character, and may be а one- 
byte integer, a two-byte integer, or some 
other size. Raster dan sets typically use the 
‘smallest required dau size. As discussed in 
‘Chapter 2, one unsigned byte may store up 
to 256 different values. Raster overlay may 
result in an output daa layer that requires а 
higher number of bytes per cell. Consider 
the overlay between two raster data layers, 
‘one layer that contains 20 different nominal 
classes, and a second layer with 27 differ- 
‘ent nominal classes. There is a total of 20 
times 27, or 540, possible output combina- 
tions. If more than 256 combinations occur, 
the output data will require more than one 
byte for each cell. Typically two bytes will 
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be used. This causes a doubling in the out- 
put filesize. Two bytes will hold more than 
65,500 unique combinations: if more cate- 
gories are required, then four bytes per cell 
are often used. 


Raster overlay requires that the input 
raster systems be compatible. This typi- 
cally means they should have the same cell 
dimension and coordinate system. includ- 
ing the same origin for x and y coordinates. 
Ifthe cell sizes differ, there will likely be 
cells in one layer that match parts of sev- 
nil cells in the second input layer (Figure 
9-42), This may result in ambiguity when 
defining the input attribute value. Overlay 
may work if the cells are integer multiples 
with the same origin, for example, the 
boundaries of a 1 by 1 meter raster layer 
may be set to coincide with a 3 by 3 raster 
layer; however this rarely happens. Data 
are normally converted to compatible raster 
layers before overlay. This is most often 
done using a resampling, as described in 
Chapter 4, In our example, we might 
choose to resample Layer 2 to match Loyer. 
1 in cell size and orientation, Values for 
cells in Loyer 2 would be combined through 
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a nearest neighbor, bilinear interpolation, 
‘cubic convolution, or some other resam- 
pling formula to create a new layer based 
‘on Loyer 2 but compatible with Loyer 1. 


Figure 9-42: Overbid raster ayers should be 
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Ап Example Spatial Analysis 


Figure 9-43 and the following figures 
briefly illustrate an application of basic 
spatial analysis. We seek to identify suit- 
able areas for wind farms, based on two. 
criteria: areas with high average wind 
speeds and low population density. High 
average winds are preferred because the 
‘energy produced at a site increases with 
‘wind speed. Low population densities are 
preferred because land is less expensive 
‘and there are fewer neighbors to bother. 
This simple example does not include obvi- 
‘ous additional factors, such as the distance. 
to power lines, avoiding protected lands, ог 
the difficulties of building offshore vs. 
‘onshore, but it does illustrate how data may 
be combined in a set of simple spatial func- 
tions to answer a question, This analysis 
requires wind data of appropriate accuracy, 
spatial extent, and appropriately summa- 
rized. For example, I might wish to base 
шу analysis on average daily wind speed, 
‘or maximum hourly wind speed for a day, 
‘or maximum daily wind speed. If these. 
data do not exist, I would need to either 
change the problem formulation, my analy- 
sis methods, or develop the data from exist- 
ing sources. For example, if a gridded data 


set does not exist, but there are point obser- 
vations from a network of weather stations, 
T might use interpolation or other methods 
(described in Chapter 12) to estimate wind 
speed across the study region. These con- 
siderations highlight an important early 

step in spatial analysis: we must asses the 
available data, and determine if it is appro- 
priate for our intended analysis. If not, we 


‘must create the required data or modify our 
analysis. 
For this , wind data were 


obtained from the US, Department of 
Energy, and population data from the U.S. 
Census Bureau. Wind data were reclassed 
to those values (4 or greater) that provided 
Suitable potential energy (Figure 9-43). We 
might represent this graphically as shown 
in Figure 9-44, where the input and output 
layers are shown as boxes, and the spatial 
operation noted inside an ellipse. Arrows 
show the direction or flow of the analysis. 
The categories used in the reclassification 
should be based on prior knowledge; here, 
‘We might know that wind levels below the 
category 4 are unsuitable for the wind tur- 
bines to be used. In general, thresholds or 
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selection values depend on the problem 
under consideration, and should be well- 


justified by additional background infor- 
mation. 


Selection of areas with a suitably 
sparse population is shown in Figure 9-45. 
Here, data are placed into two categories, 
sparse and dense. As with the wind classifi 
‘cation, the classes should be based on some 
external information, e.g., land prices drop 
ог the likelihood of irate neighbors drops 
below a specific population density thresh- 
‘ld. It also assumes the input census data 
layer provides polygons with population 
density suitably calculated. for example, in 
persons per square mile. If not, this calcula- 
tion will have to be performed prior to the. 
reclassification. 


Census data. 
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Here the threshold for density is set at 
10 persons per square mile, and polygons 
reclassed accordingly, creating а new data Popula- 
layer. A graphic representation of tbe spa- tion 
tial operation is shown in Figure 9-46. Density 
Reclassed layers were then combined 
in an overlay operation, and selected to 
identify areas that have both low popula- 
tion densities and high wind speeds (Figure. 
9-47), In practice, this will involve several dens. < 10 = sparse 
steps in a spatial operation, with separate dens. > 10 = dense, 
overlay, selection, and reclassification 
steps. These steps are abbreviated in Figure 
7, showing just one general opera- 
tion. They are more fully sketched in Figure 
9-48. Chapter 13 offers more in-depth dis- 
cussion of how we can use spatial analysis 
to develop models that help solve problems 
like this one, where we are finding loca- 
tions for an activity. 
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Network Analysis 


Networks are common in our lives. 
Roads, powerlines, telephone and television. 
cables, and water distribution systems are all 
‘examples of networks we utilize many times. 
each day (Figure 9-19). Networks must be 
effectively managed because they are crucial 
to civilization and represent substantial 
investments. Spatial analysis tools ave been 
developed to help us design, use, and main- 
tain networks, 

А network may be defined as a set of 
connected features, often termed nodes ог 
centers, These features may be centers of 
demand, centers of supply, or both (Figure 9- 
50). Centers are connected to at least one 
‘and possibly many network links. Links 
interconnect and provide paths between cen- 
ters, Traveling from one center to another 
often requires traversing many separate 
links 

Network analyses, also known as net- 
‘work models, are used to represent and ana- 
Туге the cost, time, delivery, and 


accumulation of resources along links and 
between the connected centers. Resources 
flow to and from the centers trough the net- 
‘works, In addition. resources may be gener- 
ated or absorbed by the links themselves. 

The links that form the networks may 
have attributes that affect the flow. For. 
example, there may be links that slow or 
speed up the flow of resources, ora link may 
allow resources to flow in only one direc- 
tion. Link attributes are used to model flow 
characteristics of the real network: for exam- 
ple. travel on some roads i slower than oth- 
ers, or cars may legally move in only one 
direction on a one-way street. 

The concept ofa гал! cost is key to 
many network analysis problems. A transit 
cost reflects the price one pays to move а 
resource through a segment of the network 
‘Transit costs are typically measured in time, 
distance, or monetary units; for example, it 
‘costs 10 seconds to travel through a link. 
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takes 10 seconds to traverse the link regard- 
less of direction or time of day. Alema- 
tively, costs may vary by time of day or 
direction, so it may take 15 seconds to tra- 
verse an arc during morning and evening 
rush hours, but 10 seconds otherwise, or it 
may take twice as long to travel north to 
south than to travel south to north. 

We will discuss three types of problems 
that are commonly analyzed using networks: 
route selection, resource and territory alloca- 
tion, and traffic modeling. There are many 
per EEE Won 

езе three are among the most common and 
provide an indication of the methods and 
breadth of network analyses. 

Route selection involves identifying a 
"best" route based on a specified set of crite- 
rin. Route selection is often applied to find 
the least costly route that visits a number of 
centers. Two or more centers are identified 
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‘within a network, including starting and end- 
ing centers. These centers must all be visited 
by traversing the network. There are usually 
a very large number of alternative routes, or 
‘pathways, that may be used to visit all cen- 
ters, The best route is selected based on 
Some criteria, usually the shortest, quickest, 
‘or least costly route. Further restrictions may 
be placed on the route; for example, the 
order in which centers are visited may be 
specified. 

Route selection may be used to improve 
the movement of public transportation 
through a network. School buses are often 
routed using network analyses, Each bus 
must start and finish at a school (a center) 
and pick up children at a number of stops 
(also centers), The shortest path or time 
Toute may be specified. Alternate routes are 
analyzed and the “best” route selected. 
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Selection ofthe best route involves an 
algorithm that recursively follows a least- 
cost set of ares, beginning at the current 
node. A set of interconnected network links 
is identified, as well as start and destination 
centers (Figure 9-51). The route from start to 
destination locations is typically built itera- 
tively. One route finding algorithm adds the 
least-cost link at each step. Multiple paths 
are tested until а path connects the start and 
destination centers. 

This simple method begins at the start 
center. Paths are extended by adding the link 
"hat gives the lowest total cost for all paths. 
currently pursued. The initial set of candi- 
date links consists of ай those connecting to 
"he starting point. The lowest cost link is 
added, as shown in Figure 9-52. The link 
‘with a value of six is chosen. Now the set of 
candidate links consists of any link con- 
nected to this selected link (the two links 
with costs of 15 and 8, respectively), plus 
any connected to the starting point. АЙ paths 


destination 


are examined, and the link added that gives 
the lowest total path length. In Figure 9-520, 
тко links are added, Note that the links 
added are not connected to the initially 
selected link. This would have given a total 
cost of 14 (6 plus 8) or 21 (6 plus 15), while 
the selected links give a lower path cost of 
12. Now, the candidate links are those con- 
nected to any of the selected links or to the 
start point, Since all links from the start 
point have been selected, only those con- 
nected to candidate links are examined. ОГ 
these, the lowest cost path із added. The link 
with а cost of S that is attached to the ini- 
tially selected link is chosen (Figure 9-52). 
The candidate set expands accordingly, and 
is evaluated again. Verify that the links 
shown in Figure 9-524 and Figure 9-52e 
should be the next, cumulative low cost 
paths selected. This method is used until the 
destination is reached, and the least-cost 
path identified (Figure 9-53). 
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Creating the least-cost path 
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Figure 9-53: Lest-cost pth for the example route finding algorithm described inthe text. 
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Many different pathfinding algorithms. 
have been developed, most of which are 
much more sophisticated than the one 
described above. Note that the described 
pathfinding algorithm bas a rapidly expand- 
ing number of links to evaluate at each step. 
‘Computational burdens increase accord- 
ingly. A subset of all possible candidate 
paths may be examined because it becomes. 
{oo computationally time-consuming to 
‘examine ай possible paths. Most pathfinding 
algorithms periodically review the total 
accumulated cost for each candidate path, 
and stop following some highest cost or least 
promising paths. 


There are many variations on this route 
finding problem. There may be multiple cen- 
ters that must be visited in а specific order 
and carriers defined to transport specific 
amounts to or from centers. Centers may add 
to or subtract from a carrier; for example, 
some centers might represent houses with 
children, other centers may represent 
schools, and carriers represent buses that 
transport children. Houses must be visited о 
pick up children, but a bus has a fixed capac- 
ity. These children must be transported to the 
school, and there may be time constraints: 
for example, children cannot be picked up. 
before 7 a.m. and must be at school by 7:55 
ал. Network-based route selection has been 
successfully used to solve these and related 
problems. 


Resource allocation problems involve. 
"he apportionment ofa network to centers. 
‘One or more allocation centers are defined ia 
a network Territories are defined for each of 
these centers. Territories encompass links or 
‘non-allocation centers in the network. These. 
links or non-allocation centers are assigned 
to only one allocation center. The features 
are usually assigned to the nearest center, 
Where distance is measured in time, length. 
‘or monetary units 


Resource allocation algorithms may be 
similar to route finding algorithms in that the 
distance out from each center is calculated. 
along each path. Each center or arc is 
assigned to the nearest or least-cost center. 
The route finding method is exhaustive in 


resource allocation, in that all routes are pur- 
sued, not just the least-cost route. The routes 
эге measured outward from each allocation 
center (Figure 9-54). 


Variations on resource allocation 
include setting a center capacity. The center 
capacity sets an upper limit on resources that 
may be encompassed by a territory. Links 
are assigned to the nearest center, but once 
the capacity is reached, no more are added. 
‘Maximum distance also serves to limit the 
range of the territory from the center. Both 
of these restrictions may result in some unas- 
signed areas, that is, portions of the network 
that are not allocated to a center. 


Resource allocation analyses are used in. 
many disciplines. School districts may use 
resource allocation to assign neighborhoods. 
to schools. The type and number of dwell- 
ings ina district may be included as nodes 
on a network. The number of children along 
‘each link is added until the school capacity is 
reached. Resource allocation may also be 
used to define sales territories, or to deter- 
mine ifa new business should be located 
between existing businesses. If enough cus- 
tomers fall between the territories of existing. 
business centers, a new business between. 
existing business centers may be justified, 


Traffic modeling is another ofi-applied 
form of network analysis. Streets are repre- 
sented by a network of interconnected ares 
and nodes. Attributes associated with arcs 
define travel speed and direction. Attributes. 
associated with nodes identify turns and the 
time or cost required for each tur. Ilegal or 
impossible turas may be modeled by speci- 
fying an infinite cost. Trafic is placed in the 
network, and movement modeled. Bottle- 
necks, transit times, and underused routes 
may be identified, and this information used 
to improve traffic management or bui 
additional roads. 

‘Traffic modeling through networks is a 
subáisciplinein its own right. Due to the 
cost and importance of transportation and 
traffic management, a great deal of emphasis 
has been placed on efficient traffic manag- 
‘ment. Transportation engineers, computer 


Chapter 9: Basic Spatial Analyses. 415 


ы 


eallocation center 


of verwerk Haka to dine еше, синсез are igne to the 
por 


scientists, and mathematicians have been found in literature listed at the end of this 
"modeling traffic via networks for many chapter. 

years An in-depth discussion of network 

‘analyses for traffic management may be 
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Geocoding 


Geocoding, also known as linear refer- 
‘encing, is another common application of 
spatial network analysis. Geocoding isthe 
process of spatially referencing point feae 
tures based on the address of the feature and 
knowledge of an address range forthe linear 
network. Geocoding is commonly applied to 
business sales, marketing. vehicle dispatch 
and delivery operations. and organizing cen- 
зше and other government information 
‘gathering and dissemination activities, 


Geocoding requires a set of addresses. 
associated with а set of linear features like 
roads, Typically, at least the starting and 
ending addresses for links ina network are 
‘known, These starting and ending addresses 
define an address range, and the range is 
assumed to linearly span the connecting line. 
Points on е line may be "geographic 
coded" (hence the name geocoding) in 
given an address, we may calculate approxi- 
mately where the address should occur on 
the network link (Figure 9-55), 


Geocoding: the address 321 ML King Drive 
ie placed ot the locaton that is 


(321-301)/(359-301) « 0:34. 


оё the distance from the 301 locaton tower 
the 359 location between Trd and Fourth 
streets Coordinate voues ore estensted то 
be approximately 


Yan 


= хун омеа) 
Yao + 034 (Ys, Ya) 


guy 9 55 безоар i te poems of einai te 


Коз ines более Ho 
Siok pie he mr 


ралам locaton ofa айба | 


 Geocoded addresses are typically 
assumed to arrayed uniformly along the link. 
The start and end addresses are assumed lo 
beat the ends of the link. The estimated 
location of the geocoded address is based on 
а linear interpolation, beginning at the start- 
ing address and adding a length proportional 
to the address divided by the address range 
(Figure 9-55). The estimated location may 
be placed within the block or line segment, 
Because geocoding only estimates 

‘where locations are, these locations may 
contain substantial error. These errors may 
be larger than the error associated with the. 
linear features along which the geocoded 
addresses are placed. Figure 9-56 illustrates 
some sousces of eror. Geocoding typically 
involves a regular, linear interpolation of an 
address across an address range. Address 
ranges are usually assigned ordinally, while 
the geocode is an interval estimate. In Figure 
9-560, address 250 is not halfway between 
200 and 300, and address 240 takes up an 
entire block. This ordinal interval mismatch 
may be particularly bad in rural areas, where 
development over along time period may 


ل 


location of an addres based on knowledge of 
location а Между interpolated along аслу, 
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Figure 936 hos 
Sl near арноо arenes acros a range Wen 
380 in partc above. or 1026 ш part 5. peoceded location will be in enor. 


result in substantial nonlinear address. 
arrangements. Figure 9-S6b illustrates this, 
with address 1007 almost opposite address 
1026, and numerous inconsistent intervals: 
for example, the 22 address units between 
1071 and 1093 are separated by a shorter 
distance than the 12 address units between 
995 and 1007. Also bear in mind that there 
are some regions in the world where 
addresses are not assigned in a linear fashion 
along streets, but may instead numbered in 
the order buildings were created or refer to a 
cardinal direction and distance from the 
nearest intersection. These nonlinear 
addresses can cause substantial confusion, 


develope may revl ш online aden sequences on the 
hes geocoding apes 


ound 
H Sha neo 
ores ted wih dl 


5o any application of geocoded data must 
allow for these inconsistencies, or the data 
must be evaluated and corrected. 


Geocoding is often combined with net- 
work analyses to determine shortest path or 
time travels а set of locations. Delivery 
locations may be generated from a list of 
‘orders to a business. The locations of these 
addresses are generated via geocoding. The 
locations may then be entered into a network 
search algorithm and the optimal route 
planned. Businesses save millions of dollars 
each year applying these basic spatial analy- 
ses. 
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Summary 


Spatial analysis, along with map pro- 
duction. is one of the most important uses 
of GIS. Spatial analytical capabilities are. 
often the reason we obtain GIS and invest 
substantial time and money to develop a 
‘working system. Any analytical operation 
performed on spatial or associated attribute 
data may be considered as spatial analysis. 


‘Spatial operations are applied to input 
data and generate output data. Inputs may 
be one to many layers of spatial data, as 
‘well as nonspatial data. Outputs may also 
number from one to many layers or scalar. 
values. Operations also have a spatial 
scope, the area of the input data that con- 
tributes to output values. Scopes are com- 
monly local, neighborhood. or global 
Selection and classification are among. 
the most oft-used spatial data operations. A 
selection identifies a subset ofthe features 
a spatial database, The selection may be 
‘based on attribute data, spatial data, or 
some combination of the two. Selection. 
may apply set or Boolean algebra, and may 
‘combine these with analyses of adjacency, 
connectivity, or containment, A selected set 
may be classified in that variables may be 
‘changed or new variables added that reflect 
membership in the selected set. 


Classifications may be assigned ашо- 
matically, but the user should be careful in 
choosing the assignment. Equal-area, 
‘equal-interval, and natural breaks classifi- 
cations are often used. The nature of result- 
ing classifications may depend 
substantially on the frequency histogram of 
the input data layer. particularly when out- 
liers are present. 

A dissolve operation is often used in 
spatial analysis. Dissolves are routinely 
applied after a classification, as they 
‘remove redundant boundaries that may 
slow processing. 

Proximity functions and buffers are 
also commonly aj ial data opera- 
re These acinus sara goons 
regarding distance and separation among 


features in the same or different data layers. 
Buffering may be applied to raster or vec- 
tor data, and may be simple (with a uni- 
form buffer distance), or complex (with 
multiple nested buffers or variable buffer. 
distances). 

Overlay involves the vertical combina- 
tion of data from two or more layers. Both 
geometry (coordinates) and atributes are. 
Combined. Any combination of points, 
lines, and area features is possible, 
although overlays involving at least one 
layer of area features are most common, 
‘The results of an overlay usually take the 
lowest geometric dimension of the input 
layers. 

Overlay sometimes creates gaps and 
slivers. These occur most often when a 
‘common feature occurs in two or more lay- 
ers. These gaps and slivers may be 
removed by several techniques. 


Network models may be temporally 
dynamic or static, but they are constrained to 
model the low of resources acon 
nected set of linear and point features, Traf- 
fic flow, oil and gas delivery, or electrical 
networks are examples of features analyzed 
and managed with network models. Route 
finding. allocation, and flow are commonly 
modeled in networks. 

Geocoding. or linear referencing, is 
used to calculate approximate locations 
along a linear segment when the endpoint 
addresses are known. Often used in census 
and delivery applications, geocoding works. 
best when addresses are uniformly spaced 
across the segment. Because it is an 
approximation, geocoded locations are. 
expected to sometimes be in error, and 
these errors are often more frequent in rural 
ог sparsely addressed segments. Linear ref- 
'erencing may also be used to locate 
changes in linear feature characteristics, for. 
example, road surface or accident loca- 
tions. 
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Study Questions 


9.1 - Define and give examples of local, neighborhood, and global spatial operations, 
32 - Describe selection operations. 

93 - Describe set and Boolean algebra. 

ЗА - Write the simplest Boolean expressions that result in the grey area selections: 
DI 
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9.5 - Write the simplest Boolean expressions that result în the grey area selection: 


а) b) 
y e 
PS | «у 
b. e 
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9.7 - Perform the following reclassification: 
Reclass by ID 


Type‏ 5 9" سے 
om [15 [ee |‏ 
VS H‏ 


= че» rs 
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9.8 - Reclasity the following polygons, according to the column Area, into small 
(418,000), medium (18,000 to 45,000), and large (> 45,000). 


E 


EL 


m | 15257 


6) 
a 


180%, 


ЕЕ ЕЕ ЕН 


9.9 List and describe three different classification methods, 


9.10 - What is the modifiable area unit problem (MAUP)? Why is it important? What 
is the zone effect, and what is the area effect? 


9.11 - What is a dissolve operation? What are they typically used for? 


Chapter 9: Basic Spatial Analyses 425 


9.12 - Perform a dissolve operation on the variable Type for the layer depicted below: 


= 


ШЕ ЫЕ ЕЗЕН 


9.13 - Perform a dissolve operation on the variable Type for the layer depicted below: 


D жє л 


426 GIS Fundamentals 


9.14 Draw the resultant polygon boundaries and complete the table in a dissolve 
‘operation that calculates the sum of Count, based on Cioss. Label each polygon start- 
ing lowercase a, b, c... and enter the label in МеяТО in the output table for the corre- 
sponding row. 


Мейр Соз Сот 
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9.15 Draw the resultant polygon boundaries and complete the table in the dissolve 
‘operation that calculates the average of Cost, based on Type. Label each output poly- 


9.16 - Select the most appropriate characteristics for the buffer below. 15 it simple, 
 mulii-distance, or variable distance? Does it retain or dissolve intersections? Is it 
interior or exterior? 


» ® 
50 


ухо 
== 
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9.17 Sketch out the output from a variable distance buffer applied to the set of 
points shown below. Draw output buffers that dissolve the boundaries between areas 
that fall within multiple buffers. 


# 
3b тен 
1-1100 - 
21700] фот 
3 1150 
3 [300 
5 ê 1 
6 1800 * 5 
7 | 200 . 
з |10 
9 10 
2 
7 3 
8 
: 2 
9.18 - Sketch out the output from a variable distance buffer applied to the set of 
points shown below. Dissolve boundaries for intersecting buffers. 
5 
* ID distance 
а 1| 20 
2 1 Ы 
* е 26 1500 
500 
1000 ê d 
meters 500 
250 
250 


Chapter 9: Basic Spatial Analyses. 429. 


9.19 - How are raster proximity functions different from vector proximity functions? 


9.20 - Why are output features in vector overlay typically set to the minimum dimen- 
sional order (point, line, or polygon) of the input features? 


9.21 - Complete the table for the vector point overlay shown below: 


Loyer 1 — intersect e Loyer 2 — to oriole e Output 
^ 
Ы E 


Плас county 
A fest. T Wome 
B [ioon | unco. 


9.22 - Complete the table for the vector point overlay shown below: 


Loyer 1 тезе e Loyer 2 — to creste e Output 
Р . "a . 
Р ry Р 
й 
— é 
E > 
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9.23 - Sketch both the output polygons and the resultant attribute table from the over- 
lay shown below: 


‘Spatial Deta 


tee 
[^3 
— 


9.24 - Sketch both the output polygons and the resultant attribute table from the over- 
lay shown below: 


‘Seana Dare 

pu 

m [s [em Y 
10 wes - 

ius [twr ae. {ep} 
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[ве tree] || 2 [ae |e | —7 at 
ENCES 


Chapter 9: Basic Spatial Analyses 431 


925- Complete the table in the polygon union diagrammed below: 


SAG 


C A — 30272670] 
[еы 38) 
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327. What the sliver prolem in vector layer overlay? How might this problem be 
resolved? 


9.28 - Sketch the output of the raster overlay shown below, providing both cell values 
and the output table with ID and count variables. 

overt 
^ 


9.29 - Sketch the output of the raster overlay shown below, providing both cell values 
and the output table with ID and count variables. 


Loyer 1 шег 

^[в[в]|в| [x[v[v]v 

A|e[s|s| [x|w|x]x | 

є[с|с|в| [w|viv]v | 

с|с|с|с | уу | 
چ‎ 

DIZE DEES т 

ral 2 х7 i 

zfs; 7] wla 

spel 7 


9.30 - Describedefine network models. What distinguishes them from other spatial 
or temporal models? 


9.31 - What are the common uses for network models? Why are these models so 
important? 
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932 - Calculate and record the geocoded address for the boxes labeled А, C, E, G, L, 
K. M, О, and Q in the figure below: 


Degibo Ra Bendigo St 
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9.34- If you start with the layer on the left, and wish to create the layer on the right, 
‘what would be the table you would use for the reclass operation at a), and what single 
spatial operation would you use in b) to obtain the desired result? 


o) 
= 
Шш 


935- Ifyou start with the layer on the left, and wish to create the layer on the right, 
‘what would be the table you would use for the reclass operation at a), and what single 
spatial operation would you use in b) to obtain the desired result? 
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1 0 Topics in Raster Analysis 


Introduction 


Raster analyses range from the simple to 
the complex, largely due to the early inven- 
tion, simplicity, and flexibility of the raster 
data model, Raster cells can store nominal, 
ordinal, or interval ratio data, representing a 
wide range of variables. Complex constructs 
may be built from raster data, including net- 
works of connected cells, or groups of cells 
to form areas. 


‘The flexibility of raster analyses has 
been amply demonstrated by the wide range 
of problems they help solve. Raster analyses 
may predict the fate of polluants in the 
atmosphere, disease spread, animal migra- 
tion, and crop yields. Time varying and wide 
area phenomena are often analyzed using. 
raster data, Raster analyses are applied 10 a 
range of scales, from fine grained 
for example, in U.S. EPA analysis of pol- 
luted Superfund sites to NASA global-scale 
estimates of forest growth. Local, state, and 
regional organizations use raster analyses at 
many scales in between (Figure 10-1). 

‘The long history of raster analyses has. 
yielded tools valuable to many GIS users. 
"Tools often share a conceptual basis and 
may be adapted to several types of prob- 
lems. In addition, specialized raster 
methods have been developed for less fre- 
quently encountered problems. The GIS user 
may more effectively apply raster data anal- 
ysis if she understands the underlying con- 


серы and knows a road range of raster 
жуш methods, 

Raster analysis falls into different cate- 
gories. Map algebra is the foundation for 
Tany aser workflows, This amiga um. 
draws on raster analyses spanning 

zonal, or global slopes 
To these can be added more involved proce- 
dures relating to buffering. overlays, and 
costsurfaces and many others besides. 
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Map Algebra 

Map algebra is the cll-by-cell combi- 
mation of raster data layers. The combination. 
entails applying a set of local and neighbor- 
hood functions, and toa lesser extent global 
functions, to raster data 

The concept of map algebra is based on 
the simple, flexible, and useful raster grid 
structure. Simple operations may be applied 
to each grid cell. Further, raster layers may 
be combined through operations such аз 
layer addition. subtraction, and multiplica- 
tion. 

Map algebra entails operations applied 
to one or more raster data layers. Unary 
operations apply to one data layer. Binary 
‘operations apply to two data layers, and 
higher-order operations may involve many 
data layers. 

А simple unary operation applies a func- 
tion to each cell in an input raster layer, and. 
records a calculated value to the correspond- 


ing cell in an output raster. Figure 10-20 
illustrates the multiplication of a raster by a 
scalar (a single number). Multiplying a ras- 
ter by 2 might be denoted by the equation: 

Odtoyer » Iniayer * 2  ] 

Each cell value of 1n over is multiplied 
by the scalar value 2, and the result placed in 
the corresponding cell in outyer. Other. 
unary functions are applied in a similar man- 
ner; for example, each cell may be raised to 
an exponent, divided by a fixed number, or 
converted o an absolute value. 

Binary operations also involve cell-by- 
cell application of operations or functions, 
‘but they combine data from two raster lay- 
ers. Addition of two layers might be speci- 
fied by: 
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Figure 102: An cumple ofraster operation. On the lef side (o), esch input еей is 
Si alr md teal steed at creme мара сика The pe (hf e 


pure uluseates ayer ation 


Chapter 10: Raster Analyses 437 


plus 4 


Figure 10-3: Local functions assign ouput values that depend on data only from the corresponding iut 


Figure 10-2» illustrates this raster addi- 
tion operation. Each value in Layer is added 
1o the value found in the corresponding cell 
inoyers, These values are then placed in the 
appropriate raster cell of Surioyer The cell- 
by-cell addition is applied for the area cov- 
егей by both Loyer and Loyer, and the 
results are placed inSurioyer, 

As with vector operations, raster opera- 
tions may be categorized as local, neighbor- 
hood, or global. Local operations use only 
the data ina single cell to calculate an output 
value (Figure 10-3). Neighborhood opera- 
tions use data from a set of cells, and global 
operations use all data from a raster data 
layer. 


Neighborhood 
function 


> torget cet 


[rs 


Nei operations gather input 
Sons ed ol tud pene aer ry 
calculate output values at the focal cell (Fig- 
lure 10-4), The neighborhoods may vary in 
size and shape, often using the nearest 4 or 8 
‘ells along with a center cell, but sometimes 
using larger, different rectangular, circular 
‘or other shapes, 

‘The concepts of local and neighborhood 
ns are more uniformly specified 
with raster data than with vector data. Cells 
within layer have a uniform size, so a local 
[USC LODS 
trast, vectors may i irregular 
кооз wih vastly diferent areas A local 

given raster is uniform in that 


и Майды! esos wue inya aes ma focal and adjacent etl toca alee 
Гори cell 
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it specifies a particular cell size and dimen- 
sion, while a vector local operation entails 
non-uniform areas within most layers. 

Neighborhood operations sets are also 
‘more uniformly defined in raster than vector. 
data sets. A raster neighborhood specifies a 
fixed number of cells and arrangement: for 
example, the neighborhood might be a cell 
plus the eight surrounding cells. This neigh- 
borhood has a uniform area and dimension. 
‘Vector neighborhoods depend not only on 
the shape and size of the target feature, but 
also on the shape and sizes of adjacent vec- 
tor features. 

Global operations in map algebra may 
produce uniform output, or they may pro- 
duce different values for each raster cell 
(Figure 10-5). Global operations may retum 
a single number, placed in every cell of the 
‘output layer. The global maximum function. 
Tora layer might be specified as: 

Олут « globsimaxtinJayer) — (103) 

This would assign a single value to ол- 
num. The value would be the largest num- 
ber found when searching ай the cells of 
InJoyer. This “collapsing” of data from a 
‘two-dimensional raster may reduce the map 
algebra to scalar algebra. Many other func- 
tions retum a single global value placed in 


Global 
function 


every cell fora layer, for example, the global 
‘mean, maximum, or minimum. 

Note that in our examples all have the 
same extent, e.g., LoyerA and Loyer in Figure. 
10-2 cover tbe same arca. This may not 
always be true. When layer extents differ, 
most GIS software will either restrict the 
operation to the area where input layers 
overlap. or place a mull or a “missing data" 
indicator into cells where input data are lack- 
ing. This number acts аз a "no data" Пар. 
indicating there are no results. It is a unique. 
placeholder that indicates по valid data are 
present. 

Incompatible raster cell sizes cause 
ambiguities when c raster 1 
SS oe 
er 9 and is illustrated here, The left side of 
Figure 10-6 shows a raster mismatch. Sev- 
eral cells in Layer correspond to cel A in Lay- 
eri these two layers are added, there are 
several potential input values for Loyerz cor- 
responding to one input value for eri. The 
problem is compounded for cell в because a 
portion of the cell is not defined for Loyer. 

There is input ambiguity in most raster 
operations. One might argue the best choice 
uses Layer with complete overlap, or mean 
‘or median number for all overlapping cells, 


or some weighted average. This ambiguity 
will arise whenever raster data sets are not 


aligned or have different cell sizes. While 


neut roster a 


> eut cee 
æ tec 


Figure 10.5 Global functions integrate input fom an entire layer to calculate output values 


Loyert 


Loyer? 


cell A 
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cell B. Resampled Layer? 
cell B 
cell A 


Figure 10-6: Incompatible cell sizes or grid edges shouldbe harmonized via resampling prior to analysis. 


the GIS software may have a default method 
for choosing the “best” input when cells mis- 
match, these decisions may not be univer- 
sally best. 
Analysis is usually best served by a 

prior resampling of the data ino a compati- 

le coordinate system, using the transforma- 
tion and resampling methods described in 


‘Chapter 4. The analyst may select a template 
ог standard layer in a specified coordinate 
зулеп, мїн pci saning cori 
Such as the lowe-lef cell values (Figure 10- 
б). With fixed dimension and starting 
pm celis unambiguously match across 
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Local Functions 


There is a broad number of local fanc- 
tions (or operations) that can be conve- 
niently placed in one of four classes: 
‘mathematical functions, Boolean or logical 
operations, reclassification, and multilayer 
raster overlay. 


Mathematical Functions. 


We may generate a new data layer by 
applying mathematical functions on a сей- 
by-cell basis to input layers (Figure 10-2, 
‘Table 10-1). Any number of inputs and 
puts may be supported, depending on the 
fonction. 

‘Abroad array of mathematical functions. 
may be used, with a few constraints. Raster 
data value and type are perhaps the most 
Common constraints. Most raster models. 
store one data value in a cell. Each raster 
data set has a data type and maximum size 
"hat applies to each cell; for example, a two- 
byte signed integer may be stored. Mathe- 
‘matical operations that create noninteger. 
values, or values larger than 32,768 (the 
capacity of a two-byte integer), may not be 
stored accurately in a two-byte integer out- 
put layer. Most systems will do some form 
‘of automatic type conversion, but there are 
often limits on the largest values that can be 
stored, even with automatic conversion. 

Although the set of functions and func- 
tion names differ among software packages. 
nearly ай packages support the basic arith- 
metic operations of addition through divi- 
sion, and most provide the trigonometric 
functions and their inverses (e .. sin. asin). 
‘Truncation, power, and modulus functions. 
are also commonly supported, and vendors 
often include additional fonctions they per- 
ceive to be of special interest. These mathe- 
‘matical functions are often applied in raster 
analysis, for example, when multiplying 
each cell by 3.28 to convert height values. 


Note that although many systems will 
let you perform these operations on any type 
ofraster data, they often only make sense for. 
intervalratio data, and may return erroneous 
results when applied to nominal or ordinal 
data. Numbers may be assigned to indicate 
population density by high, medium, and 
low. and while the sin function may be 
applied to these data, the results will usually 
have little meaning. 


Chapter 10: Raster Analyses 441 


Logical Operations cally etic io 1 and 0 even though there 
P may be a range of input values. Also note 
‘There are many local functions that that there may be cells where no data аге 


apply logical (also known as Boolean) oper- 

ations to raster data. A logical operation ур. = oa ae specifie CIS ушеш Мон eens, 

ically involves the comparison of a cell to a ign mall pi apir it aeg 

scalar value or set of values, and outputsa others assign false values when any input is 

“true” or a "false" value. True is often repre- — nup. 

sented by an ouput vale of 1. and false 

an oe value of 0. ы Figure 10-7o shows an example of the 
‘OR operation. This cell-by-cell comparison. 


‘There are three basic logical " (oe 3 
AND, OR and NOT (Figure 10-7) The SANE ae ари ет ee 
AND and OR operations require two input 


cells in either layer or both layers may be 
layer. These layers serve as basis forcom- Tye for tue assumed and that in this 
parison and assignment. AND requires both example, muli values (x) are assigned when 
input values be true fr the assignment ofa theo the inputs inl Some impleret 
Considered о eme and гет false Мае. tatom aign a true value 1o the орд cl 
gonsidere о Бети, and zeros false Nol fany of the inputs is non-null and no zr; 
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the reader should consult the manual for the 
specific software tool they use. 

Figure 10-7 shows an example of the 
NOT operation. This operation switches true. 
for false, and false for true, Note that null 
input assigns null output. 

Finally, note that many systems provide. 
‘an XOR operation, known as an eXclusive 
OR (not illustrated in our examples). This is 
similar to an OR operation, except that true. 
values are assigned to the output when only 
‘one or the other of the inputs is true, but not 
‘when both inputs are true. This is а more 
restrictive case than the general OR, and 
may be used in instances when we wish to 


distinguish among origins fora true assign- 
ment. 


Logical operations may be provided 
that perform ordinal or equality compari- 
sons, or that test if cell values аге null (Fig- 
ure 10-8). Ordinal comparisons include less 
than, greater than, less than or equal to, 
greater than or equal to, equal, and not equal. 
‘Examples of these logical comparisons are 
shown in Figure 10-8o and », respectively. 
These operations are applied cell-by-cell, 
and the corresponding true or false output 
шшр! As shown in Figure 10-6 be 
сей of the layer is not 
Virtua to mpa cdi i e mood 
input layer, so a 0 (false) is assigned to the 


Input Output 
Apane o|1|o| 5 9|o|0|1: 
о|н]г[а| tess [5/2] s омот 
1|2 |s|o| mon) olzinie of ol N|1 
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equal а 
[2 [5 [о °|]г|н[2 о. [јо 
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o|N|2|- o|1|o|o 

ISNULL E 
1j2|5|o0 о [ојојо 
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upper left cell in the output layer The upper 
right cell in the first layer is less than the cor- 
responding cell in the second input layer, 
resulting in the assignment of 1 (rue) in the 
output layer 


‘We often need to test for missing or 
unassigned values in a raster data layer. The 
‘operation has no standard name, and may be 
variously called via 15m1SSIN6, 1SNULL oF 
some other descriptively name function. The 
‘operation tests each cell for a null value, 
shown as in Figure 10-8. А 0 is assigned 
то the corresponding output cell if a non-null 
value is found, otherwise а 1 is assigned. 
These tests for missing values are helpful 
When identifying and replacing missing data, 
or when determining the adequacy of a data. 
set and identifying areas in need of addi- 
tional sampling. 

Figure 10-9 shows an example ofa logi- 
cal comparison among two data layers. The 
left and central panels show land cover for 
am agricultural area, with three categories: 
corn (1), soybeans (2), and ай others (0). We 
may be interested in identifying acres that 
were rotated between these (wo crops, or 
rom these two crops to other crops over tbe 
2009-2010 time period. The logical equal 
Comparison between these layers reveals 
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areas that have changed. Ifthe cell values 
are not equal across the years, the logical 

еза comparison will reum a value of o, 

while areas that remain the same will main- 
tain the value of 1. Further Logical compari- 
sons, using class values, could identify how 
much each of the component crop types had 


applied to inter 
gorical data, although ordinal comparisons 
should be carefully applied to categorical 
data. For example, in our crop types example. 
in Figure 10-9, soybeans are assigned a 
value of 2, and are “larger” than com, but 
this distinction does not imply that soybeans 
are somehow two times larger, more valu- 
able or anything other than just different 
from com. 


1} unchanged 
Bl chonges 
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Reclassification 


Raster reclassification assigns output 
values that depend on the specific set of 
input values. Assignment is most often 
defined by a table, ranges of values, or a 
conditional test. 


Raster reclassification by a table is 
based on matching input cell values to a 
reclassification table (Figure 10-10). The 
reclassification table specifies the mapping 
‘between input values and output values. 
Each input cell value is compared to entries 
for an “in” column in the table. When а 
‘match is found, the corresponding "out" 
value is assigned to the output raster layer. 
‘Unmatched input values can be handled in 


‘one of several ways. The most logically con- 


sistent manner is to assign a null value, as 
shown in Figure 10-10o for the input value 
оГ-1. Some software simply assigns the 

input cell value when there is no match. As 
‘with all spatial processing tools, the specif- 


ics ofthe implementation must be docu- 
‘mented and understood. 

Figure 10-10» illustrates a reclassifica- 
tion by a range of values. This process is 
similar to a reclassification by a table, except 
that a range of values appears for each entry 
in the reclassification able. Each range cor- 
responds to an output value. This allows a 
‘more compact representation of the reclassi- 
fication. A reclassification over а range is 
also a simple way to apply the automated. 
reclassification rules discussed at length in 
Chapter 8 — the equal interval, equal area, 
natural breaks, or other automated class-cre- 
ation methods. These automated assignment 
methods are often used for raster data sets 
because ofthe large numberof values they 
contain, 

Data can also be reclassified to select 

ıt source based on a condition. These 
tional" functions have varying syntax, 
but typically require a condition that results 
in a true or false outcome. The value or 


the! 
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Sout values assigned 


source layer assigned fora true outcome is 
specified, as is the value or source layer 
assigned for a false outcome. An example of 
one conditional function may be: 


(Output = CON бея. out # roe. ол if toise) (104) 


where сон is the conditional function, es is 
the condition to be tested, out rue defines 
the value assigned if the condition tests true, 
and ол f toise defines the value if 
the condition tests false (Figure 10-11). 
Readers familiar with computer program- 
ming languages will recognize Сом asa kind 
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ofif-then statement. Note that the value that 
is output may be a scalar value, for example, 
the number 2, or the value output may come 
from the corresponding location in a speci- 
fied raster layer. The condition is applied on 
a cell-by-cell basis, and the output value 
assigned based on the results of the condi- 
tional test. 


Output = CON (LoyerA < 3, LoyerB. Layer) 


Layer 


1/3 


E 


based on a conditional test. In this! 
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Nested Functions InsQutput = ABS При ае) 009) 


Local functions may be nested in analy- 
ses. Functions are nested when a fonction is 

used аз the argument of another function. 

For example, we may wish to take the natu- FinolOutput = LN (IntOutput) — (106) 
ral logarithm (LN) of all the cells in а layer 

The mathematical us function в ову 

defined for positive values. When inputs are 2 
negative, we need to either accept null val- We could do the same thing by nesting 
ues in the output data layer, or process these the functions, if allowed by the GIS soft- 
input cells in a different manner. We could Ware: 

do this by applying the absolute value func- 

tion (465) to create an intermediate output. 

output for oue ial ea Thiscoudte  —— Feo LNA Grota) 007) 
described as the equations: 


Output = CON (ISNULL(LoyerA), LayerB, LoyerC) 
Layers 
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toni LayerC. Note the Output values are gor from Loyer, forthe ells LayerA that have all (N) vab- 


Figure 10-12 shows another example of 
nested function. Output values are assigned 
from two different input layers. Cell values 
are assigned from Loyer8 when LoyerA values 
are null, and from Loyerc when LoyerA values 
are non-null, This might be desirable if we 
have an incomplete but otherwise high-qual- 
ity data set, and we wish to fill missing val- 
ues from the next best available data. Map 
algebraic expressions with nested functions 
сап become quite complex, but also may be 
quite effective and efficient in solving com- 
plex problems. 


Raster Clip and Overlay 


Clip (or subset extraction) is another 
common type of local raster function Figure 
10-13). Source and template data layers are 
specified and an output data layer created. 
This output layer contains only the values of 
the source that are indicated by the template 
layer. The nature of the template and output 
data layer values depend on the specific 
implementation of the raster extraction 
Template values are typically assigned a 
value of 1 for those cells that are to pass 
through to the output, and а 0 or null value 
for those that are to be ignored. Output val- 
ues for the clipped area are copied from the 
source, while output values for the area out- 
side the clipped region are typically assigned 
a null value, or the value 0. 

‘Care must be taken to ensure there are 
по ambiguous cells created by this conven- 


Source 


Template 
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tion. For example, if there are null values in 
both the source data layer and the area out- 
side the clip. one cannot be certain if the 
nulls come from the source or indicate a 
region outside the clip area. Special coding 
‘or other provisions can be used to avoid 
these ambiguities 

A clip using raster data may also be 
implemented as a reclassification and then а 
multiplication. Note that in Chapter 9, we 
described the vector clip function as а spe- 
cial case of overlay in which the attributes 
and interior geometry were saved based on 
the boundaries in a clipping layer. This clip- 
ping layer serves as an outline or area tem- 
plate for which data are retained. In a raster 
lip. tbe clipping layer may be represented 
S tots wih vale of embeded 
in nonclipping cells with values of 0. 

Figure 10-14 illustrates a raster clip 
‘operation that is a combination of cell reclas- 
sification and multiplication. The first step is 
to identify the set of values that defines the 
lip area. This is the portion of the input data. 
layer to be transferred to the output data. 
layer Individual cell values or cell values 
‘over an interval or range may be defined. 
These may come from a selection based on 
raster valves, from a list of values, or from a 
previous spatial operation such as а buffer. 

A clip template is created that defines a 
binary mask, a set of cells that “mask” out a 
portion of an input layer. Cells to be passed 
through to the output layer are set to the 
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Input raster 
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value 1 (Figure 10-14), Cells to be “clipped mented so that it admits continuous num- 


away” are setto the value O, The clip em 
plate or layer is then multiplied by the input 
Taster, yielding an output raster. Cell-by-cell 
multiplication by 1 passes values through to 
the output layer. Multiplication by 0 discards 
values for the cell, resulting in a clipped ras- 
ter to the area of interest, 

Raster overlay combines features from 
two or more data layers, and is among the 
‘most useful of spatial functions. The features 
in raster data correspond to cells, or perhaps 
groups of cells with the same values, but as 
with vector overlay, great utility is often 
gained from combining data from different 
layers. 

There are some differences between ras- 
ter and vector overlay due to the differences 
in the data model. Raster overlay is often 
restricted to nominal data. The cell values do 
not typically represent continuous variables. 
such as temperature, but rather categorical 
‘variables such as type or township name. 
Although rater overlay may be imple- 


bers, this typically results in too many 
unique cell combinations to be of much 
value. If continuous data are used, they are 
often converted to categories first; for exam- 
ple, rainfall may be assigned to low, 
medium. or high classes. 

Raster overlay involves the cell-by-cell 
< of values in two or more layers 
(Figure 10-15). The values in each input data 
layer are associated with a specific combina- 
tion of additional variables, and these addi- 
tional variables may be recorded in an 
attribute table. Each unique combination of 
cells from the two layers is identified, and 
assigned a new identifier (04-10) in the Out- 
putter. Note the two input attribute tables 
are combined in a corresponding fashion. In 
Figure 10-15 you can see the upper left cor- 
ner of the Output ioer has the corresponding. 
эре and поте attribute values from Input 
‘over 1, and the 1D and cost attribute values 
from 1лрл yer 2. 


Input IBE 
layer 1 - 
НАПЕ 
elele[e 
m 
xIx|x|x DI ош 
Input =] > p RÀ 
loyer 2 ^ 
DEDE 
ПОЕ 
[ipo soe Ti Ten 
Output аыр 3 | e [seoes | x 72 
-€ ЕЕЕ аря 
PF pes [s [ve 
nnnm ps per y [en 
FERE | y ре 
«fale | s a Ty fio 
Tir 1018 Rae overlay алыр, б ену sell combinanon of Ана nip y- 


Recall that in many implementations of 
the raster data model there is a many-to-one 
relationship between the raster cells and the 
attribute rows. This occurs because multiple 
cells correspond to each row. Also note that 
the cells may form disjunct regions of the 

same type; for example, Figure 10-15 shows 


‘cell of ype oin the lower left comer of 
Input lover 1 that is not contiguous with the 
rest ofthe o cells in Loyer 1. This combina- 
tion carries through to the output, where 
there are disjunct groups of cells with олло 
values of 6. 
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Fuzzy Membership and Raster. 
Overlay 


Raster data may be reclassified into 
fizzy sets, typically to represent uncertainty 
‘or ambiguity in set membership. For eram- 
ple, land cover mapping requires assigning 
distinct classes like water or upland. There 
may be some uncertainty if an area near a 
shoreline is water or upland. Similarly land 
cover along a suburban/mural edge may be 
ambiguous, because mis-interpreting a lawn 
for a hayfield may change a cell assignment. 
from one class to another. Similar ambigu- 
ities due to data source or conversion ог 
assignment methods may result in fuzziness 
in raster values. We sometimes wish to 
reflect this ambiguity. 


We may represent uncertainty in our. 
measurement with fizzy membership val- 
ues. Suppose we have a forestinon-forest 
classification map based on satelite data. 
Recall that the level of infrared reflectance 
‘may be used singly or in combination with 
other remotely-sensed image bands to detect 
vegetation. Our classification process often 
provides an estimate of the per-pixel likeli- 
hood of our assignment being correct, 
some cells are categorized as forest with a 
likely 99% accuracy, while other cells have 
only a $5% chance of being forest. We may 
‘wish to represent this uncertainty in mem- 
bership, again with fuzzy sets. 
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Fuzzy sets may be assigned via a mem- 
bership function (Figure 10-16). In our cur- 
теш example, we may represent the surety а 
pixel is dense vegetation from the infrared 
index we use to assign the vegetation type. 
Ground visits, or truth, may show that cells 
with а strong infrared index are always vege- 
tation, while those with a weak index are 
‘not, and intermediate indexes are split. We 
тау develop a graph that shows our cer- 
tainty as a function of the infrared index that 
connects specific levels of the index with the 
vegetation set. 

Fuzzy raster cells are then assigned 
based on the input values, transferred 
through the membership function, to give a 
corresponding fuzzy value. High values 
reflect high likelihood of membership in the 
‘Vegetation set, while low values represent 
low likelihood of membership. 

Fuzzy rasters may be combined in vari- 
‘ous overlay operations, often in estimating a 
membership or suitability that depends on 
several factors (Figure 10-17). For example, 
it шау be important to identify riparian 
zones near water courses in а semi-arid envi- 
ronment. Vegetation is a good indicator of 
riparian zones, but there are non-riparian 
areas with dense vegetation, and riparian 
areas without vegetation. We might also use 
elevation data and derivatives to help with 
the riparian/upland zone assignment. For 
example, riparian areas tend to be flatter 
than adjacent landscape due to sediment 


Fuzzy Raster 


Raster values oa the lef correspond to а probability of memberip to 
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Local topography within riparian areas make (e largest value for dat cel fom the vege- 
the slope and height relationships somewhat tation, slope, and rise above water cells. 
fuzzy, and so I may develop membership Fuzzy AND assigns the minimum of the 
functions for each of these factors and prob- component cells, and hence is more conser- 
ability of riparian membership. ‘ative in assignment. Other versions include 

Fuzzy membership rasters may then be 27У product, sum, weighted sum, and 
combined in several ways to assign member- Samma combinations. 
ship. Fuzzy OR overlay assigns the probabil- 
ity of cell membership to the largest value of 
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Neighborhood, Zonal, Distance, and Global Func- 


tions 


Neighborhood functions (or operations) 
in raster analyses deserve an extended dis- 
cussion because they offer substantial ana- 
lytical power and flexibility. Neighborhood 
‘operations are applied in many analyses 
across a broad range of topics, including the 
calculation of slope, aspect, and spatial cor- 
e 

ighborhood operations most often 

of a moving window. 

isa configuration of raster cells 
‘used to specify the input values for an opera- 
tion (Figure 10-18). The window is posi- 
tioned on a given location over the input 
raster, and an operation applied that involves 


E ‘on the 
A window" 


the cell at the center of the window position. 
The result of the operation is saved to an out- 
ри layer at the center cell location. The win- 
dow is then "moved" to be centered over the 
adjacent cell and the computation repeated 
(Figure 10-18). The window is swept across 
а raster data layer, usually from left to right 
in successive rows from top to bottom. At 
each window location, the moving window 
function is calculated and the result output to 
the new data layer. 

Moving windows are defined in part by 
their dimensions. For example, а 3 by 3 
moving window has an edge length of three 
cells in the x and у directions, for a total area 


the cells contained in the window. The result оѓ nine cells. Moving windows may be any 
ofthe operation ia usually associated with size and shape, but ey are typically odd. 
I 
HL 


numbered in both the x and y directions to 
provide a natural center cell, and they are 
typically square. A 3 by 3 cell window is the 
most common size, although windows may 
also be rectangular. Windows may also have 
irregular shapes; for example, L-shaped, cir- 
cular, or wedge-shaped moving windows are 
sometimes specified. 

There are many neighborhood functions. 
that use a moving window. These include 
simple functions such as mean, maximum, 
‘minimum, or range (Figure 10-19). Neigh- 
borhood functions may be complicated, for 
example, the statistical standard deviation, 
and they may be nonarithmetic, as the func- 
tions that etam a count of the number of 
unique values, or the mode, ога Boolean 
occurrence. Any function that combines. 
information from a consistently shaped 
group of raster cells may be implemented 
with a moving window. 

Moving window functions may be arith- 
metic, adding, subtracting, averaging. or oth- 
erwise mathematically combining tbe values 
around а central cell, or they may be com- 
parative or otherwise extract values from a 
зе of cells. Common statistical operations. 
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include calculating the largest value, the 
mode (peak of a histogram), median (middle 
value) the range (largest minus smallest), or 
diversity (number of different values). These 
neighborhood operations are useful for many 
kinds of processing. 

Consider the majority operation, also 
known as a majori filer, You might won- 
der why one would want to calculate a 
majority filter for a data layer Data smooth- 
ing іза common application. We described 
in Chapter 6 how multiband satellite data are 
‘often converted from raw image data to land 
cover classification maps. These classifiers 
‘often assign values оп a pixel basis, and 
often result in many single pixels of one land 
cover type embedded within another land 
cover type. These single pixels are often 
smaller than the minimum mapping unit, the 
smallest uniform area that we care to map. A 
majority filter is often used to remove this 
classification “noise.” 

A majority filter is illustrated in Figure 
10-20. It illustrates NASS crop data for an 
‘area in central Indiana. based on classified 
Satellite images. There are over 40 common 
land cover types in the area, but these have 
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‘been reclassified into the dominant types of 
developed (road), com, beans, and other 
crops. Each pixel is 30 m across, and the 
image on the left is NASS data as delivered. 
Note that com and beans dominate, but there 
are many “stray” pixels of a dissonant vege- 
tation type embedded or on the edge of a 
dominant type in an area; for example. bean 
pixels in a com field, or com pixels in a bean 
field. These stray pixels usually do not rep- 
тезем reality, in that although there may Бе 
"he isolated plant or two from previously 
deposited seed in these annual crop rota- 
ions, the patches almost never approach 30. 
meters in size. The embedded cells are most 
often mis-classifications due to canopy thin- 
ning or perhaps weeds below the crop, and 
are often below the minimum mapping unit. 
The illustrated majority filter counts the 
values ofthe four cells sharing an edge with 
any given cell. If a majority, meaning three 
‘or more cells, are of a type. then the cell is 
‘output as this majority type (top. Figure 10- 


20) If a majority is not reached, as when 
‘only one or two cells add up to the most fre- 
quent type in the four bordering cells, then 
the center cell value is unchanged in ће out- 
put (bottom, Figure 10-20). The removal of 
most of the single pixel "noise" by the 
majority filer can be observed in the classi- 
fied image on the right side of Figure 10-20. 

There may be many variants for 
‘operation. The majority filter just discussed. 
‘may assign an output value if only two of the 
four adjacent сей values are most frequent, 
or use the $ or 24 nearest cells to calculate a 

ity. The dependence of output on algo- 
rop dide кей ые 
‘when applying any raster operation. 

Figure 1021 shows an example of a 
mean calculation using a moving window. 
The function scans all nine values in tbe 
‘window. П sums them and divides the sum 
by the number of cells in the window, thus. 
calculating the mean cell value for the input 
‘window. The multiplication may be repre- 


sented by a3 by 3 grid containing the value 
one-ninth (1/9). The mean value в then 
stored in an output data layer in the location. 
corresponding to the center cell of the mov- 
ing window. The window is then shifted to 
the right and the process repeated. When the 
end ofa row is reached the window is 
returned to the lfimost columns, shifted 
down one row, and the process repeated until 
all rows have been included. 

The moving window for many simple 
mathematical functions may be defined by a 
kernel. A kernel for a moving window func- 
tion is the set of cell constants for a given 
window size and shape. These constants are 
used in a function at every moving window 
location. The Кете! in Figure 10-21 
fiera mean As the igure shows, each cell 
value for the Input layer at a given window. 

ition is multiplied by the corresponding. 
Kernel constan. The resul is placed in the 
Output layer. 


(7 mena window 


Input layer. 
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Note that when the edge of the moving 
‘window is placed on the margin of the origi- 
nal raster grid, we lose the outside rows and 
‘columns їп our output raster. This із illus- 
trated in Figure 10-21, right. The moving 
window is shown in the upper right comer of 
the input raster. If we place the center cell 
for the window any higher or further to the 
right then part of the kernel will fall outside 
‘our input data. Output values are not defined 
for the cells along the top, bottom, and side 
margins of the output raster when using а 3 
by 3 window. Each neighborhood operation 
applied to successive output layers may 
‘erode the margin further. For larger kernels, 
єз. SxS or 717, we loose a wider margin, 

‘We commonly address this margin ero- 
жардык andy ut. 
Data may be lost at the margins, but these 
data are not important if they are outside the 
area of interest. 

Different moving windows and kernels 
may be specified to implement many differ- 
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ent neighborhood functions. We might be 
interested in the difference in a variable 
across a landscape, for example, changes in 
canopy height to detect the boundary of a 
recent forest fire. 

Edge detection may be based on com- 
paring differences across a kermel. The val- 
ues on one side of the Кете! are subtracted 
from the values on the other side, Large dif- 
ferences result in large output values, while 
‘small differences result in small output val- 
ues. Edges are defined as those cells with 
‘output values larger than some threshold. 

Figure 10-22 illustrates the application 
‘of an edge detection operation. The kernel in 
"he center top of Figure 10-22 amplifies dif- 
ferences inthe x direction. The values inthe 
left of three adjacent columns are subtracted 
from the value in the corresponding right- 
hand row of cells. This process is repeated 
for each cell in the kemel, and the values 
summed or averaged across all nine cells. 
Large differences result in large values, 
either positive or negative, saved in the cen- 
ter сей. Small differences between the left 
and right columns lead toa small number in 


Input 
loyer 


diffe 


Kernels 


the center cell. Spatial structure such as an 
abrupt change in elevation may be detected 
by this kernel. The kernel in the center-bot- 
tom of Figure 10-22 may be used to detect 
differences in the y direction. 


There are functions known as high-pass 
filters with kernels that accentuate differ- 
‘ences between adjacent cells. These hi 
pass filter kemels may be useful in identify- 
"ng the spikes or pits that are characteristic 
of "noisy" data. Cells identified as spikes or 
pits may then be evaluated and edited as 
appropriate, removing the erroneous values. 
High-pass kernels generally contain both 
negative and positive values in a pattern that 
accentuates local differences. 

Figure 10-23 demonstrates the use of a 
igh-pass kemel on a dataset containing 
noise. The elevation data set shown in the 
top portion of the figure contains a number 
of anomalous сей. These cells have 
extremely high values (spikes, shown in 
black) or low values (pits, shown in white) 
relative to nearby cells. If uncorrected, різ 
and spikes will affect slope, aspect, and 
other terrain-based calculations. These 
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locally extreme values should be identified 
and modified. 


The high-pass kemel shown contains a 
value of 9 in the center and -1 in all other 
cells, Each value is divided by 9 to reduce 
the range of the output variable. The Кете! 
retums a value near the local average in 
smoothly changing areas. The positive and 
negative values balance, returning small 
numbers in flat areas. 

The high-pass kernel generates a large. 
ositive value when centered on a spike. The 
large differences between the center cell and 
adjacent cells are accentuated. Conversely. a 
large negative value is generated when а pit 
is encountered. An example shows the app 
cation of the high-pass filter for a cell near 
the upper left comer of the input data layer. 
(Figure 10-23). Each cell value is multiplied 
by the corresponding kernel coefficient. 
These numbers are summed, and divided by 
9, and the result placed in the c 
‘output location. Calculation results are 
shown as real numbers but cell values are 


shown here recorded as integers. Output val- 
ues may be real numbers or integers, ө 
depending on the programming algorithm 
and perhaps the specifications set by tbe 
user, 


‘The mean filter (Figure 10-24) both 
“smooths” data and increases spatial covari- 
ance in the output data set. Averaging com- 
bines nearby cells, bringing high or low 
‘values nearer the local mean. The nearby 
values become more similar, meaning they 
bave a higher spatial covariance (discussed 
in greater depth when in the spatial predic- 
tion section of Chapter 12). Large numbers. 
are found near large numbers. and small near 
Small. Low spatial covariance means nearby 
values are unrelated - knowing the value at 
‘one cell does not provide much information 
about the values at nearby cells. High spatial 
covariance in the “real world” may be a 
good thing. If we are prospecting for miner- 
als, then a sample with а high value indicates 
же are probably near a larger area of ore- 
bearing deposits. However, if tbe spatial 
autocorrelation is increased by the moving 
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window function, we may get an overly opti 
mistic impression of ош likelihood of sirik- 
ing it ich. 

‘The spatial covariance increases with 
many moving window functions because 
these functions share cells in adjacent caleu- 
lations. Note the average function on the left 
of Figure 10-24 shows sequential positions. 
ofa 3 by 3 window. The average of 11.3 is 
calculated for the left window location and 
placed in the output layer. The window cen- 
ter is then sifted one cell to the right, and 
the average of 11.1 for this location caleu- 
lated and placed in the corresponding output 
cell, Note that there are six cells in common 
for these two means. Adjacent output cells 
share most of their cells in е mean calcula- 
tion. When a particularly low or high cell 
occurs, it affects the mean of many cells in 
the output data layer. This causes the outputs 
1o be quite similar. and increases the spatial 
covariance. 
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Zonal Functions 

Zonal functions apply operations based 
оп defined regions, or zones, within an area. 
me the zones are recorded in a data 

a unique identifier for each zone. 
rece pr plene 

‘eg. to calculate averages, a range, largest 
values, or tabulate areas within zones. Figure 
10-25 illustrates summarization of U.S. 
‘National Land Cover Data (NLCD) for a set 
‘of county zones in central Minnesota. Here, 
the area and percent land cover by aggregate 
type were summarized by counties, with 
‘each county serving as a zone. 

‘There are many reasons for applying 
zonal functions, We often want to summa- 
rize data for defined units in a region, 
including total population in a county, aver- 
Spore! andis a 
i across 
hoods. More complicated analyses may 
require different operations be applied to dif- 


Zonal Statistics of Raster Land Cover (x) by Vector Boundaries 


Moron Mule Lacs Kanone 


Figure 10-28: Statisties are обез cleat by zones bere extracting land cover categories from a raster 
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Figure 10.26, An example ofa zonal бако A eges we clued based on the zones stored 


ferent zones; for example, we may be creat- 
ing an elevation data set from many sources, 
and we may wish to use the highest-quality 
на in zones where й exists, and use succes- 
sively poorer data in other zones. Zonal 
functions give us these capabilities. 

Figure 10-26 illustrates the application 
‘of a zonal function. In this example, the. 
function calculates the zonal average for 
Ir-Loyer, based on zones defined by 
Zone.Loyer, The syntax here is 


Out Layer + 
ZoneAvg(InLoyer.Zone_Layer) 08) 


‘There is no standard syntax across software, 
зо the specific order and interpretation of. 
‘operands depend on the software used. 

Note that the output here is a raster, with 
identical values in all the cells ofa given 
zone. This is a rather common version ofa. 
zonal functions. Many GIS softwares may 
create a table with zonal identifiers and sum- 
‘mary values, with or without an output data. 
layer. 


Zones may be defined in vector or raster 
layers, depending on the software, If raster, 
zonal functions may require compatible 
coordinate systems, cell sizes, and orienta- 
tions, although sometimes software will 
automatically convert via projection and res- 
ampling. as needed. 


Cost Surfaces. 


Many problems require an analysis of 
travel costs. These may be monetary costs of 
travel, such as the price one must charge to 
profitably deliver a package from the nearest 
distribution center to all points in a region. 
‘Travel costs might also be measured in other 
units, for instance, the time it takes to travel 
from a school to the nearest hospital, oras a 
likelihood, such as the chance of a noxious 
foreign weed spreading out from an intro- 
duction point, These analyses may be per- 
formed with the help of cost surfaces. 

А cost surface contains the minimum. 
cost of reaching cells in a layer from one or 
more source cells (Figure 10-27). The sim- 
plest cost surface is based on a uniform. 


travel cost. Travel cost depends only on tbe. 
distance covered, with a fixed cost applied 
per unit distance traveled. This cost per unit 
distance does not change from cel to cell. 
There are no barriers, so tbe straight line dis- 
tance is converted to а cost. First, the dis- 
tance is calculated from our source ог 
starting location to each cell. As illustrated 
in Figure 10-27, the distance is calculated 
based on the Pythagorean formula. Dis- 
tances to each cell in the х and y directions 
contribute to the total distance from a source 
cell or cells. 


The distance from a source cell is com- 
bined with a fixed cost per unit distance to 
calculate travel cost, As shown in the right 
side of Figure 10-27, each distance value is 
multiplied by the fixed cost factor. This 
results in a cost surface, a raster layer con- 
taining the travel cost to each cell. If there 
эге multiple source cells, travel costs are cal- 
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culated from each source cell, and the lowest 
cost is typically placed in the output cell. 

Note that distance is commonly mea- 
sured at least two ways — a straight line 
(Euclidian) distance, as shown in Figure 10- 
27, or as a row-columa distance. A row-col- 
‘umn distance is measured along the row and 
column axes, and is by definition longer than 
the straight-line distance. Straight line dis- 
tances are preferred in most applications, 
although they are more difficult to imple- 
ment. 


Travel costs may also be calculated 
‘using a friction surface. The cell values of a 
friction surface represent the cost per unit 
travel distance for crossing each cell. Fric- 
tion surfaces are used to represent areas with 
a variable travel cost. Imagine a large mili- 
tary base. Part of the base may include flat, 
smooth areas such as drill fields, parking. 
lots, oc parade grounds. These areas are rela- 
tively easy to cross, with correspondingly 


Cost Surface 


соз! » distonce * fixed cost factor 


eg 
cost » distance * 2 


+224 
TURE source 

20 “Уло ы 40 | 20 = 

224, 

IW | 10 448 | 282| 20 

283 | 224 | 20 566 | аав | 40 

10 

‘units 


462 GIS Fundamentals 


low travel times per unit distance. Other 
parts ofthe base may be covered by open 
grasslands. While the surface may be a bit 
Tougher, travel times are still moderate. 
Other pats may be composed of forests. 
‘These areas would have correspondingly 
high travel times, as a vehicle would have to 
pick a path among the tees Finally, there 
nay be areas occupied by water. fences. or 
‘buildings. These areas would have effec- 
tively infinite travel times. 

Each cell in the friction surface contains 
tte con roped io mensa portion of the 
сей (Figure 10-28). A value of 3 indicates it 
costs three units (of time, money, or other 
factor) per unit distance in the cell If a cell 
is 10 wide and costs 3 units per unit distance, 
and the cell is crossed along the width, then 
the cost for traversing the cell is 10 times 3, 
ог30 units. 

The actual cost for traversing the cell 
depends on the distance traveled through the. 
cell. When a cell is traversed parallel to the 
row or column edge, then the distance is 
Simply the cell dimension, When a cellis 
traversed at any other angle, the distance. 
will vary. It may be greater or less than the 
cell dimension, depending on the angle and 
location of the path. 


cost = cell distance * friction 


The travel cost required to reach each 
сей is the minimum accumulated total of the 
cost times the distance toa source cell. We 
specify a minimum accumulated cost 
because if there is more than one source cell, 
there is a large number of potential paths to 
each of these source cells. Distance across 
each cellis multiplied by the friction surface 
cost for that cell and summed for a path to 
accumulate the total travel cost. The lowest 
соя path from a source location to a cell is 
usually assigned as the travel ost to that 
cell 


Figure 10-28 shows an example of cal- 
culations for the friction cost along a set of 
paths. These are straight line paths that 
travel either parallel to the cell boundaries 
(purely in an x or y direction) or at some 
angle across cells. 

Sample calculation of the friction costs 
for a path parallel to the x axis is shown at 
the top middle and on the left side of Figure 
10-28. Note that when traveling parallel o а 
cell boundary, one half-cell width is tra- 
versed in the stating and ending cells. Inter- 
mediate cells are crossed at a full cell width, 
‘When moving from the starting cell to the 
adjacent left cell, a friction surface value of 
1 із encountered, then a friction surface 
value of 3. One-half the distance, 5 units, is 
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‘through the top-right cell at a per-unit fric- 
tion cost of 1. One-half the distance is 
through the adjacent cell to the left, at а per- 
unit friction cost of 3. The total cost is then 
the distance traveled in each cell multiplied 
by the per-unit friction cost of the cell: 
371653620 воз) 
The friction cost when traversing cells at 
ап angle is illustrated at the bottom left and 
bottom center of Figure 10-28. The friction 
costis the sum of the cell cost per unit dis- 
tance multiplied by the distance traveled in 
each cell. The path begins at the source cell 
and ends two cells to the left and one cell 
down. Each intervening cellis traversed for. 
a distance of 5,6 cell units. The distance tra- 
versed in each cell is multiplied by the fric- 
tion value for each cell. Tbe total cost for 
this legis 
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cost = 
Tow/column distance • friction 


friction surface. 
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1n general, the cost of any path is 
expressed as 

Tatakas erdene, ПОШУ 
where a, is the distance and c is the cost 
across each cell ofa path. 

Many softwares calculate the cumula- 
tive cost for the most direct path using a 
slightly different approach, called the row~ 
column distance. Rather than travel long а 
straight line path, the row-column distance 
travels from cell center to cell center (Figure 
10-29). Calculations are much easier 
because the length of the path within each 
cellis constant with row column distance, 
and for square cells this distance equals the 
‘ell width (or height), The distance in each 
cell varies when using the straight line dis- 
tance, and so the time required to calculate 
the accumulated distance is substantially 
increased. The row column distance gives 
the same relative costs for travel from a 
Source cell to each target cell, but the abso- 
lute costs change (compare the costs on the 


Tight of Figure 10-28 to the right of Figure 
10.29), 
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Many implementations of a friction sur- 
face or cost function allow you to search for 
the minimum cost to travel to a cell from а 
set of source cells. The straight line distance 
may not be the least costly." and so altema- 
tives may be examined. There are many 
Toutes from any source cell to any destina- 
tion cell, thousands of distinct routes in most. 
instances. Software typically implements 
some optimization algorithm to eliminate 
Toutes early on and reduce search time, 
thereby arriving at the cost surface in some 
acceptable time period. 

Note that barriers may be placed on a 
cost surface to preclude travel across por- 
tions ofthe surface. These barriers may be 
specified by setting the cost so high that no 
path will include them. Any circuitous route 
will be less expensive than traveling over the 
barriers. Some software allows the specifica- 
tion ofa unique code to identify barriers, and 
"his code precludes movement across the. 
cell 


Summary 
Raster analyses are essential tools in 
GIS. and should be understood by ай users. 


Raster analyses are widespread and well 
developed for many reasons, in part due to 
"he simplicity of the data structure, the ease 
with which continuous variables may be rep- 
resented, and the long history of raster anal- 
yes. 

Map algebra is a concept in which raster 
data layers are combined via summation and 
multiplication. Values are combined on a 
cell-by-cell basis, and may be added, sub- 
tracted, multiplied, ог divided. Care must be 
taken to avoid ambiguous combinations in 


the output that originate from distinct input 
combinations. 


Raster analyses can be local, neighbor- 
hood, zonal, or global. Local analyses are 
very commonly used because they are essen- 
tial to many tasks. Neighborhood operations 
are particularly common in raster analyses, 
and may be applied with a moving window 
approach. A moving window is swept across 
all cells ina data layer, typically multiplying 
kernel values by data found around a center 
cell. Window size and shape may be modi- 
fied at the edges of the data layers. Moving. 
windows may be used to specify a wide 
range of combinatorial, terrain, and statisti- 
cal functions. Zonal functions can mimic 
some elements of vector analysis by identi- 
fying regions as the basis for work. Global 
functions, examined above in the context of 
map algebra, typically draw on all cells in 
the raster to come up with single values. 

Finally, there are general analyses that 
may be applied using raster data sets, Com- 
monly used are buffering and overlay. but 
these ме joined by more sophisticated forms 
such as cost or friction surfaces that are an 
important subset of proximity analyses that 
may be easily applied in raster analyses. A 
cost surface identifies the travel costs 
required for movement from a specified se 
of locations. 
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‘Study Questions 


10.1 - What is map algebra? 


10.2 Why must raster layers have compatible cell sizes and orientations for most 
raster combination operations? 


10.3 - What is а null value in a raster dataset? How is this null value typically treated 
in a raster operation? 


10.4 - Perform the listed raster operations. 


Te a Taa Ta Ta] Peor me tones aprttons st a 
Se жоон. cerleed on the noted cet 

19| [и [а | ero 

lain GROTON on he cri 

7 hale [1] ale епо 

x [u [о [е [5 [5 [0 | ILE 

s[e pvp » [| con on me sor 

[s [s [ «| s [n |w 

s|v|s 1 |: 


10.5- Perform the listed raster operations. 


Perform me folowng operations mth o 


SUE ыйкы ш БЫ 3х3 window, centered on the noted cells: 
1 [e |s [opu a]? а 
‘fonder GOON. on he cle 
Ol [Jay ofa] | | | maman te rige 
2 [а о [е |з [ею due ronge. on he savore, 
average. on me ee 


ШОШЕ ЕП anes 
s [m e| s [1| ve] 
»|v|* 1115 
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10.6- What are the values in cells C1, C2, C3, and C12 in the output layer? 
Con(Loyert < 2, 0.1) 


10.7 - What are the values in cells CS, C7, C10, and C13 in the output layer? 
Con(Layert < 2, 0, 1) 
Layert 


s 
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163 What are the cll values for cells CI C3, C4; and CIO in the сири layer, 
low? 


Output = CON((layerAaN), 1, layerA) 
layer 


105. What are the cell values for cells C2, CS, C7, and CH in the output уч, 


Output « CON (oyerA«-N), 1. layerA) 
layer 


10.10 - Give an example of a nested operation. 
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10.11 - What are the values in output cells C9, C10, C11, and C12? 
Output = CON(ISNULL(loyerA), 1. N) 


loyerA 
1[n[n[o 
о[и |е: 
м: [5 [о 
Output 
NUTINI EY Га[ссз[се 
сз] ce| c7 | ce 


pgag 


pgogg 


1012 - What are the values in output cells C7, C8, C13, and C16? 


Output = CON(ISNULL(layerA), 1. N) 


layer 

010165 

о[и 1 

n[als 

NJANI) [а[ег[ез[с 
[es] ce| cv [св 


сө/сю[си сг 


pogga 


1013 - What is the scope of a raster operation? 


10.14 - Does a NOT operation applied to a raster cell value containing a NULL value 
retum a NULL value, a zero value. а 1, or some other non-null value? 


10.15 - Diagram an AND operation on a raster data сей. 
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10.16 - Provide the answer for the following logical operations: 


1 


i 


ojo 


0|1|0 


о[5[0 


о[ојеј[о 


0 


о|о 


0} 0/0 


0|0|0 


10.17 - Provide the answer for the following logical operations: 


0|o|N|3 


0|N|O 


0|0|0 


з|ојо[о 
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10.18 - Describe how local arithmetic functions can be used to apply a clip function 
ina raster environment. 


10.19 - What is a Кете! in a moving window operation? Does the kemel size ог 
shape change for different portions of the raster data set? Why or why not? 


10.20 - What moving window operation would most likely use the kemel below? 


ЧИННЕ 


а [о [о |o [а 


2 |о [о [2 


#[а[4[4[# 


1021 - What moving window operation would most likely use the kernel below? 


1 2 1 

о о о 

-1 -2 а 
L 


10,22 - What is meant by high spatial covariance in a raster data layer? 
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10.23 - Calculate the cost of travel between A and В, and A and C, over the cost sur- 
face below, both by straight line, and by row-column paths. 


Source/target cells 


A 

B С |110 units 

Cost surface 

s[s[s[s 

4175 5 

2[5[1[6 , remember, the 
HOHE diagonal of a square 


is 1.41 x the edge 


10.24 - Calculate the cost of travel between A and B, and A and С, over the cost sur- 
face below, both by straight line, and by row-column paths: 


Source/target cells 


A 

B С |10 units 

Cost surface 

[2[4]7[в 

з[1[7 [9 

s|1[4|7 МҸ remember, the 
1412 diagonal of a square 


is 1.41 x the edge 
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1 1 Terrain Analysis 


Introduction 


Elevation and related terrain variables 
are important at some point in almost every- 
one's life. Elevation and slope change across 
the landscape (Figure 11-1), and this varia- 
tion determines where rivers flow, lakes 
occur, and floods are frequent. Terrain varia- 


tion influences soil moisture and hence food 
production. Terrain in large part affects 
water quality through sediment generation. 
and transport. It strongly influences trans- 
portation networks and the cost and methods 
‘of building construction, Terrain variables 


Figure 1.1: An example ofa пета мей image ofthe емет United State, 


Кеч 


baed on local 


anaes are clearly denied meld the Cental Cal 


ip etait nt e eo ie рт alte he cca ари берен mma кы vl af 


the Basin and Range region (courtesy USGS). 
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are frequently applied in a broad range of 
spatial analyses (Table 11-1). 

Given the importance of elevation and 
other terrain variables in resource manage- 
‘ment, and the difficulties of manual terrain 
analysis, it is not surprising that terrain anal- 
ysis is well developed in GIS. Indeed, it is 
‘often impractical to perform consistent ter- 
Tain analyses without a GIS. For example, 
slope calculations over large areas based on 
‘manual methods are slow, error prone, and 
inconsistent. Elevation change over a hori- 
zontal distance is difficult to measure, these 


measurements are slow, and estimates are 
likely to vary among human analysts. In 
contrast, digital slope calculations are easy 
to program. consistent, and have proven to 
beas accurate as field measurements. 

Both data and methods exist to extract 
important terrain variables viaa GIS. Digital 
elevation models (DEMS), described in 
Chapters 2 and 7, have been developed for 
most of the world using methods described 
їп Chapters 5 and 6, and DEM renewal and 
‘improvement continues. 


‘Table 11-1: A subset of commonly used terrain variables (adapted from 


Moore et al., 1993). 


Variable Description Importance 
Height Elevation above base | Temperature, vegetation, visibily 
зоре Rise relative o ‘Waterflow, flooding, erosion, travel 
horizontai distance | cot construction suitably, 
geology, insolation, soll ерт 
Aspect ‘Downhit direction ot | nsolation, temperature, vegetation, soll 
steepest зоре characteristics and moisture, 
мому 
Upslope area — | Watershed area Soll moisture, water runoff volume and 
above a point "ing, polation or erosion 
hazards 
Fowiengm Longest upstream Sediment and erosion rates 
flow path to a point 
Upslope length | Mean or total Sediment and erosion rates 
upstream fow path 
tenth rom a point 
Profle Curvature paralel lo | Erosion water flow acceleration 
we зоре direction 
Plan curvature | Curvature perpendic- | Water Bow convergence, sol water, 
ular lo slope direcion | erosion 
мену зде obstructon rom | uuy location, viewshed 
given viewpoints jeer: 


Calculations are based on cell values 
assigned to a regular grid. We use the con- 
cept of Z values, the height stored in the ras- 
ter arrays, to extract information about 
terrain, using the magnitudes and pattems of 
changes їп Z across the grid (Figure 11-2). 
For example, the height differences between 
adjacent cells or in a neighborhood of cells 
are used to calculate a local slope (sope in 
Figure 11-2). The angle and orientation of 
lines defined by x. y, and Z values near a 
point are used to calculate the normal vector, 
At right angles to the local surface (Figure 
11-2), Local curvature and slope direction 
are also calculated by differences in Z values 
ina neighborhood. 

Many terrain analysis functions can be 
specified by a mathematical operation 
applied to an appropriate moving window. 
The results from these mathematical opera- 
tions in tum provide important information 
about terrain characteristics that аге helpful 
in spatial analysis. 


Chapter 11: Terrain Analysis 475 


Slope and Aspect 

Slope and aspect are two commonly 
‘used terrain Variables, They are required in 
many studies of hydrology, conservation, 
site planning. and infrastructure develop- 
ment, and are the basis for many other ter- 
rain analysis functions. Road construction 
costs and safety are sensitive to slope. 
‘Watershed boundaries. flovpaths and direc- 
tion, erosion modeling, and viewshed deter- 
‘mination (discussed later in this chapter) all 
use slope and or aspect data as input. Slope 
‘or aspect may be useful in mapping both 
‘vegetation and soil resources. 

Slope is defined as the change in eleva- 
tion (a rise) with a change in horizontal posi- 
tion (a run), Seen in cross section, the slope 
is related to the rise in elevation over the run 
inborzoal position (Figure 11-3) Slope i 
often reported in degrees, between zero 
(flat), and 90 (vertical). The slope is equal to 
45 degrees when the rise is equal to the run. 
The slope in degrees is calculated from the 
rise and run through the tangent trigonomet- 
ric function. By definition, the tangent of the 


The artose parma ond асре deectons at any sont ore 


str DEM. nd champs: don elu‏ پا dapsone кыйы mpl‏ ی 
for calculating various terrain attributes ы‏ 
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Sepe os percent « Ê “10 


«a= 100 


Sepe oe degrees « ф 


ton (r8) 


8 
To convert from percent slope то degrees, 
орду татты, 

€ 3% = how many degrees? 

A/S“ 100 = 3, then A/B = 3/100 = 0.03 
men toi! (003) + 172 degrees 


formula, showing be ise 
Море sage (0) 


TEE 
me (B). 


slope angle (0) is the ratio of the rise over 
"he run, аз shown in (Figure 11-3). The 
inverse tangent ofa measured rise over a run. 
gives the slope angle. A steeper rise ог 
Shorter run lead to а higher ¢ and hence. 
steeper slope. 

Slope may also be expressed as a per- 
сеш, calculated by 100 times the rise over 
"he run (Figure 11-3). Slopes expressed as a 
percent have magnitudes from zero (flat) to 
infinite (vertical), with a sign convention. 
inconsistently defined. Some authors define 
а positive slope as uphill (0 to +20), and a 
negative slope downhill (0 to о), while oth- 
ers define slope only downhill (0 to +). A 
slope of 100% occum when бе rise equis 

тш. 


Calculating slope from a raster data 
layer is more complicated than in the cross- 
section view shown in Figure 11-3. The ras- 
ter cells occur at regular intervals across an 
irregular terrain surface. Slope direction at a 
point in the landscape is typically measured 
im the steepest direction of elevation change 


arrows for some example 
Кес 
EIE 
= 


(Figure 11-4). Slope changes in a complex. 
way across many landscapes, and calcula- 
tions of slope must factor in the relative 
changes in elevations around a central cell. 
‘As demonstrated in Figure 11-4, the 
aspect, or slope direction, often does not 
point parallel to the raster rows or columns. 
Consider the cells depicted in Figure 11-5. 


Higher elevations occur at the lower right 
corner, and lower elevations occur toward. 
the upper left. The direction of steepest slope 
trends from one corner towards the other, but 
does not pass directly through the center of 
any cell. How do we obtain values for the 
rise and run? Which elevations should be 
used to calculate slope? Intuitively we 
should use some combination of a number of 
cells in the vicinity of the center cell, per- 
haps ай of them. 

Elevation is often represented by the let- 
ter Z in terrain functions. These terrain func- 
tions are usually calculated with a 
symmetrical moving window. A 3 by 3 cell 
window is mest common, although 5 by 5 
and other odd-numbered windows are also 
used, Each cell in the window is assigned a 
subscript, and the elevation values found at 
Window locations referenced by subaeripted 
Z values. 


Figure 11-6 shows an example ofa 3 by 
3 cell window. The central cell has a value of 
44, and is referred to as сей Zo. The upper 
left cellis referred to as 2, the upper center 
cell as Z2, and so on through cell 2, in the 
lower right comer, 

Slope at each center cell is most com- 
monly calculated from the formula: 


[n 


Chapter 11: Terrain Analysis 477 


where s is slope, aton is the inverse tangent 
function, Z is elevation, x and y are the 
respective coordinate axes, and dZ/óx and 
2/8у are calculated for each сей based on 
elevation values surrounding a given сей. 
The symbol 4Z/o represents the rise 
(change in Z) over the run in the x direction, 
and GZ/dy represents the rise over the run in 
they direction. These formulas are combined 
to calculate the slope for each cell based on 
the combined change in elevation in the x 
and у directions. 


Many different formulas and methods 
have been proposed for calculating dZ/dx 
and dZ/óy. The four nearest cells method, 
shown in Figure 11-6 and at the top of Fig- 
ure 11-7, is the simplest, and uses the four 
cells closest to Zo: 


вш, (25 - 20/ (8C) ma 


z/y » 21-29020 шз 


Here, C is the cel dimension and the Zs are 
defined as in Figure 11-6. This method uses 
the four cells that share an edge with the 
focal cell, 2, Ze, Z2, and Zy, in calculating 
2/dxand 42/45, This four nearest method 
is perhaps the most obvious and provides 
reasonable slope values under many circum- 
stances. 


for Ze: 
dZ/dx = (49 - 40)/20 = 045 
dZ/dy « (45 - 48)/20 » -015 


slope = aton ([(045)* « (-015Y 5) 


1253 
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elevation values kernel for dZ/dx kernel for d2/dy 

42 | 45 | 47 ojojo ojij o 
Е FF 

40 | 44 | 49 alo}. ojojo 
FF нл 

44 | 48 | 52 ojojo ojajo 

т» ier (2-2,0 @ту-(т-т,у?С 
“8 70ую-ов Tayo. он 


spe anos tis] 


Third order finite difference 


Ed 


‘elevation values kernel for dZ/dx kernel for dZ/dy 
42 | 45 | 47 ajoja 
40 | 44 | 49 EE 
44 | 48 | 52 ajoji 
E Ezz Tay = 
(23-20) 2025-2) NZ- 24) AZ 27) 
ГА асау 
Фе. 
[wo 
269-40. 
г-м) ую 
E 


slope = aton{(0.39F + Co 16Y]'* « 229° 


) and rd order finite difference method 
slope. С и cell size and dZ/d and d2/Oy are the 


ipsi cng etl pon) Nar ddl dope ans 


А common alternate method is known 
asa third order finite difference approach 
(Figure 11-7, bottom). This method for cal- 
culating а2/4х and а2/гу differs mainly in 
the number and weighting it gives to cells in 
the vicinity of the center cell. The four near- 
єзї cells are given a higher weight than the 
“comer” cells, but data from all eight nearest 
cells are used. 


Several other methods have been devel- 
oped that are better for calculating slope 
under certain conditions. Better means that, 
оп average, a method produces more accu- 
rate slope estimates when compared to care- 
fully collected field measurements. 
However, no method has proved best under 
all terrain conditions. Literature on the meth- 
ods, their derivation, and application are 
listed at the end of this chapter 

‘Comparative studies have shown the 
two methods described here to be among the 
best for calculating slope and aspect over a 
wide range of conditions. The method using. 
the four nearest cells was among the best for 
smooth terrain, and the 3rd order finite dif- 
ference approach is often among the best 
‘when applied to rough terrain. 

Aspect is also an important terrain vari- 
able that is commonly derived from digital 
elevation data. The aspect at a point is the 
Steepest downhill direction. The direction is 
typically reported as an azimuth angle, with 
zero in the direction of grid north, and the 
azimuth angle increasing in a clockwise 
direction (Figure 11-8). Aspects defined this 
way take values between 0 and 360 degrees. 
Flat areas have no aspect, because there is no. 
downhill direction. 


Aspect (ct) is most often calculated using 


(8) 


а-о аот 2 
(E 


where aton is the inverse tangent function 
that retums degrees, and dZ/dy and dZ/àx 
are defined as above. 


Chapter 11: Terrain Analysis 479 


Figure 14-8: Арен may be reported asan azi- 
Керсен depen 


As with slope calculations, estimated 
aspect varies with the methods used to deter- 
mine 2/4х and Z/dy. Tests have shown the 
four nearest cell and third order finite differ- 
‘ence methods again yield among the most 
accurate results, withthe third order method 
among the best under a wide range of terrain 
‘conditions, 


Profile curvature and plan curvature are 
two other local topographic indices that are 
important in terrain analysis and may be 
derived from gridded elevation data, Profile 
and plan curvature are helpful in measuring 
and predicting soil water content, overland 
flow, ainfall-ranoff response in small catch- 
ments, and the distribution of vegetation. 


Profile curvature is an index of the sur- 
face shape in the steepest downhill direction 
(Figure 11-9). The profile curvature may be 
envisioned by imagining a vertical plane, 
slicing downward into the earth surface, 
with the plane containing the line of steepest 
descent (aspect direction). The surface traces 
ıa path along the face of this plane, and the 
‘curvature is defined by the shape of this 
path, Smaller values of profile curvature 
indicate a concave (bowi shaped) path in the 
‘downhill direction, and larger values of pro- 
file curvature indicate a convex (peaked) 
shape in the downhill direction. 
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Different softwares apply different sign 
‘conventions, sometimes making concave 
curvature positive, sometimes assigning 
them negative values, Raw values are 
reported in some versions, while other soft- 
Nares scale curvatures over a standard 
range, e.g.. from 0 to 100. As with most spa- 
tial analysis, the specific software imple- 
‘mentation should be verified over known 
test cases. 

Plon curvanureis the profile shape in the. 
local direction of level at right angle to the 
steepest direction. This means plan curva- 
ture is measured ata right angle to profile 


curvature (Figure 11-9). Plan curvature may 
also be envisioned as a vertical plane slicing 
into the surface, and is measured in a hori- 
zontal plane. The surface traces a path on the. 
face of the plane, and the plan curvature is a 
measure of the shape of that path. Concave 
plan curvature values are small or negative 
for sloping valleys or clefts, while convex 
plan curvature values at ridge and peak sites 
are large or positive. 


БЕЛА Moone бише per may 


ашы coy of wr ele ged 
кей mete 
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‘These concepts of directional terrain. 


soil is thinner and water scarcer on ridges 
‘and peaks because they are convex, while 
materials accumulate in pits and channels, 


shapes 
y directions, a ridge is convex in one direc- 
tion but relatively flat in another, while a pit 
В concave in ! directions (Figure 


orthogonal 
11-10). Formulas similar to those in Fi 
11-9 have been developed to measure 
concavity in specified йор. 
combinations 


o identify tern fea- 
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Hydrologic Functions 


Digital elevation models аге used exten- 
sively in hydrologic analyses. Water is basic 
to life, commerce, and comfort, and there is 
а substantial investment in water resource 
‘monitoring, gathering, protection, and man- 
‘agement. Spatial functions are applied to 
DEMs to yield important informatica on 
hydrology. 

A watershed îs an area that contributes 
flow to a point оа the landscape (Figure 11- 
12). Watersheds may also be named basins, 
contributing areas, catchments, drainages, 
and subbasins or subcatchments. The entire 
‘uphill area that drains to any point on a land- 
scapes the тепе for tpi, Waler 

here in the upstream area of a 
vate] wil es dro bt poa. 
‘Watersheds may be quite small. For exam- 
ple, the watershed may cover only a few 
Square meters on a ridge or high slope. Local 
high points have watersheds of zero area 
because all water drains away. Watersheds 
may also be quite large, including continen- 
tal areas that drain large rivers such as the. 
Amazon of Mississippi Rivers. Any point in 
‘the main channel of a large river has а large. 
upstream watershed. 


The drainage network is the set of 
streams and rivers in a watershed, and it is 


Figure 1.12: The waterbed and 
northern Minnesota Water drains 


шсш 
Ecc TIUS 
23 = 


completely contained within the watershed. 
As shown in Figure 11-12. the stream net- 
work often shows a dendritic pattem, with 
smaller watercourses branching off from 
larger segments as one moves upstream. The 
base ofthe drainage network is often called a 
pour point or outlet. 

Flow direction is used in many hydro- 
logic analyses. The true surface flow direc- 
tion is the path water would take, if dumped 
in sufficient excess on a point so as to gener- 
ate surface flow. This excess water flows in 
the steepest downhill direction, usually set 
equivalent to the local aspect. 

The use of aspect to assign flow direc- 
tion may be wrong, particularly in nearly flat 
areas and in built environments. Water flows 
both above and below the surface; if subsur- 
face flow is large, ignoring it may cause 
errors. If soils have different permeabilities, 
or resistance to flow, then subsurface flow 
direction may be different than surface flow 
direction. In steep, undeveloped terrain, 
thee на strong downslope раина 
gradient that often dominates, and surface. 
and subsurface flow directions are often sim- 
ilar, so aspect provides a reasonable approxi- 
mation of overall flow direction. In flat or 
nearly flat terrain, soil permeability may 
dominate. causing different subsurface and 
surface flow directions. Ditches, culverts, 


nere for the Bre River 
Eom ырам! ake. af te left 
"The watershed мез con 
Tscanons esa be mapped 
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Ph] <> flow direction 
RI watershed from 
кы ы flow direction 


drainage network 
r from flow direction, 
upstream area 


d 
f 
i 


buried storm sewers, and other built features 
alter flow directions in ways that aren't rep- 
resented by termin. However, subsurface. 
drainage and built features are often based 
‘on modified flow directions that аге first 
derived from surface shape. 


Flow directions may be envisioned as an 
arrow from a single cell to a single adjacent 
cell, and stored as compass angles in a raster. 
data layer (Figure 11-13). Acceptable values 
are from 0 to 360 if the angle is expressed in 
degrees azimuth. Alternately. flow direction 
can be stored asa number indicating the 
adjacent cell to which water flows, taking a 
value from 1 to $ or some other unique iden- 
tifier for each direction towards cells. 


The use of a single flow direction is an 
incomplete representation in many 
instances. Cells often exhibit divergent flow. 
in multiple directions out of a сей to multi- 
ple adjacent cells (Figure 11-14). Flows may 
also be convergent, with multiple cells con- 
tributing to a cell. The most common flow 
direction methods provide a single direction 
for each cell, so divergent and some conver- 
gent flows are not represented. One solution 
involves recording sub-cell flow directions, 


13 Flow direction (arrow and umber electing aint бере. 
TCI TIL 


watershed and drninage net- 
the бок direction foreach el. These how 


but this leads to more complicated raster 
structures and calculations. 


‘When the flow direction arrow from one 
сей does not point exactly at the adjacent 
cell, we may distribute the flow to more than 
‘one adjacent cell. There are various ways to 
distribute flows among adjacent cells. The 
DS method is common, and assigns all flow 
from а cell to the cell with the steepest 


өмү 1-14 Aa cep ts (a) cone 
Кое fom 
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downhill gradient (Figure 11-15, left), The 
DS is simple to understand, program, and 
store, but is particularly poor at representing 
divergent flow and flow in low-gradieat 
areas. This can cause large errors in derived 
‘measures such аз upslope contributing area 
‘or soil moisture indexes, and lead to atypical 
drainage networks in nearly flat areas. Out- 
put flow direction rasters derived from the 
DS method may be represented with only $ 
codes, allowing а simple and compact data 
layer. 

Alternative flow direction methods may 
assign flow to multiple cells, and hence rep- 
resent some forms of divergent flow. One 
common method, known as D-infniy,dis- 
tributes flow to one downslope cell when the 
flow direction is exactly toward the center of 
the cell, and otherwise assigns a portion of 
the flow to each of the two adjacent cells in 
"he downslope direction (Figure 11-15, 
right). The split is proportional to the angles 
between the steepest downslope direction 
and the respective cell centers. This reduces 
‘the main shortcoming of the DS method, 
‘while slightly increasing complexity. 

While perhaps more accurate in many 
conditions, multiflow direction systems have. 


Mi Пон area for celi O assgned to cel 1 


not been widely implemented. A more com- 
‘mon option is to use higher-resolution raster 
data such that raster cell size is small enough 
to make within-cell divergence or conver- 

gence impacts negligible. 

Flow accumulation area, contributing 
ата, ог upslope area are other important 
hydrologic characteristics. А flow accumu- 
lation area function is based on a flow direc- 
tion surface. The flow accumulation. 
function places а value in each cell that is the 
area uphill that drains to that cell. 

Watersheds may be identified once a 
flow direction surface has been determined, 
Flow direction is followed “uphill” from a 

until a peak is reached, Each uphill 
Sain eve many curing celb, and 
the flow into each of these cells is also fol- 
lowed uphill. The uphill list is accumulated. 
recursively until all cells contributing to the 
starting сей have been identified. and thus 
the watershed is defined. 

Flow direction in flat areas is difficult to 
calculate and prone to error, Aspect is unde- 
fined in a truly flat region, because there is 
zero gradient. Flow directions in these cases 
may be strongly influenced by small height 
errors, so flow directions are sometimes 
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igure 11-15: The DS flow direction method (above lefi) assigns all low to the cell center closest to 
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manually specified, or the aspect calculated 
using a larger cell size or neighborhood. The 
neighborhood may be successively 
expanded until an unambiguous flow direc- 
tion is defined. 

‘Vector incision is another common 
method for prescribing flow direction in flat 
areas, A vector flowpath, e.g., a digitized 
stream segment, is overlain with the raster, 
and raster values modified downstream 
along the vector to specify an appropriate 
flow direction. The most common approach 
lowers elevations along the flowpath, taking 
cate to not create a sink along or at the end 
of the Покрай. 

А drainage network is the set of cells 
through which surface water flows. Streams, 
creeks, and rivers occur where flow direc- 
tions converge. Thus, a flow direction may 
be used to produce a map of likely stream 
location, prior to field mapping 
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(Figure 11-13, Figure 11-16). A drainage 
network may be defined as any cell that has 
a contributing uphill area larger than some 
threshold. These drainage networks are only 
approximations, because the method does 
not incorporate soil texture, depth, porosity, 
subsoil water movement, or other properties 
that affect surface flow. Nonetheless, a 
drainage network derived from terrain data 
alone ts often a useful first approximation. 
The uphill area for each cell may be calcu- 
lated, and the area compared to the threshold 
ares. The cellis marked as part of the drain- 
age network if the area surpasses the thresh- 
old. 


А drainage network may have discontin- 
uous lines when local small dams or sink 
areas capture flow, where all surrounding. 
‘ells point into, and none out of, a location 
(Figure 11-16). This may create cells imme- 
diately downhill from the sink that has a zero 
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‘upstream area. The stream will end, but then 
‘may begin again further downhill. Natural 
sinks may be quite common in karst regions, 
‘where sinkholes occur оп the surface due to 
collapsed subterranean cavems (Figure 11- 
17). Hydrologic sinks are also common 
along drainage ways in dry areas where 
check dams are built to reduce flash flooding 
or store water. Pits are also common in areas 
of deranged topography. for example. in the. 
relatively flat, recently glaciated terrain. In 
most other areas, pits are often data artifacts 
and do not represent real geography. Pits 
represented in DEMS should be evaluated. 
earlier during processing to determine if they 
are real, and how processing alters results. 

‘Random errors in DEM elevation values 
‘often create spurious pits (also known as 
false sinks). Because our technologies for 
‘creating DEMS are imperfect, DEMS often 
contain these pits that aren't on the Earth's 
surface. 

Spurious pits are found in most DEMs 
due to small elevation errors. For example, 
DEM data collected with LIDAR often have 
a small ground footprint, and may sample. 
mall features that are above the surrounding 
ground level. A laser image over a recently 
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plowed field may retur spot heights for 
local mounds and furrows, incompletely har- 
vested crops, and farm machinery. А log or 
dense shrubs in a steep-sided ravine may be 
misidentified as the ground surface, creating 
а barrier in the data that doesn’t represent 
true conditions. Pits can be artifacts of inter- 
polation methods that are used to fill in the 
‘rid values in unsampled locations. Post pro- 
cessing aims to remove these spurious read- 
ings, but they are common nonetheless. 

Pits may cause problems over locally 
Bat surfaces, often along drainage ways 
(Figure 11-18). Flow direction and flow 
accumulation functions often retum errors 
due to spurious pits, particularly near water- 
courses. These low areas are shown as white 
patches in the figure. These apparent ponds 
Фо not exist in many landscapes. in that an 
erroneous pit in а stream course creates false 


Pits causes errors in subsequent hydro- 
logic calculations. Drainage networks are 
incomplete, low accumulation values are 
too low, and watersheds may be improperly 
identified when pits are encountered (Figure 
1149). 
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DEMs must be "conditioned" to remove 
erroneous depressions (Figure 11-20). This 
involves pit identification, followed by 
either filling or downcutting downstream 
сей to remove the pit. A threshold is often 
specified above which a pit is considered 
true part of the landscape and not removed. 
This threshold is typically larger than com- 
‘mon vertical errors in Ше data but also less 
‘than any true, "on the ground,” pit depth. 
Known pits may be identified prior to the 
filling process and left unfilled. Once spuri- 
‘ous pits are removed, further processing to 
identify watersheds and drainage networks 
may proceed. 

‘The pit may substantially alter the DEM, 
and he scope of alteration may depend on 
the method (Figure 11-21). А fill process. 
raises the values of a local depression until 
all cell values are at lest equal to the value 
atthe local “rim” or edge of the depression 
Figure 11-21, center) This may create fat 
surface, with no unambiguous drainage. 
direction. so some variants of the fill process 
add a small slope over large fil areas to 
ensure drainage toward a downhill direction. 
Pits may also be removed through a breach- 
ing process (Figure 11-21, bottom), in which 
cells along a steepest gradient are lowered, 


Condition DEM 
Flow Direction 
Flow Accumulation 
Stream Tirol 

Place re 


В 
Watershed 


Figure 11-20: Steps in a watershed delineation 


searching a specified surrounding area to 
identify the steepest downhill path. 

As shown in Figure 1 
may sometimes better reflec 
age pathways rather than fills, and may 
result in more “natural” landscapes. It often 
depends on the nature ofthe depression, 
whether it is due to а spurious. small, iso- 
lated low elevation value (fil usually pre- 
ferred for conditioning), or a narrow, high, 
linear feature, often built and with a culvert 
or other subsurface drainage way (breach. 
usually preferred for conditioning). Unfortu- 
nately, many GIS softwares do not provide a 
breach function. even though breaching is 
increasingly useful for high-resolution 
DEMS based on LIDAR over urban or built- 
up areas 


Drainage and watershed geography. 
inferred from terrain analysis depend sub- 
stantially on the algorithm used, particularly 
for flow direction, so care should be taken in 
identifying the methods and thresholds that 
give sufficiently accurate results for the 
intended tasks. Many softwares only provide 
depression filing, and DS flow direction, 
and often result ш erroneous lowpaths in 
flat or oear-flat terrains. The broadest range 
of general hydrologic and general terrain 
analysis tools are currently provided by 
Whitebox GAT, developed and maintained 
by John Lindsay at the University of Guelph. 

To review, the steps for identifying а 
‘watershed from a DEM is shown in Figure 
11-22. DEMS are conditioned as needed, and 
then the йолу direction, accumulation, 
stream threshold, and watershed boundaries. 
calculated. Different conditioning and flow 
accumulation methods may result in slightly 
different stream locations and, in some 
cases, watershed boundaries, 

Several other hydrologic indices have 
been developed to identify locally conver- 
gent or divergent terrain positions, or terrain 
‘morphometry related to hydrography. These 
indexes are used in many subsequent topo- 
graphic and hydrologic analyses, such as 
predicting plant community composition ог 
growth, erosion modeling. or estimating the 
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Conditioned Elevation 


Flow Direction 


‘Stream Threshold and Outlet — 
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na ofthe crete waterbed and damage newer features fom a 
Tigere 11-22 An example of the steps required to anisage 


rainfall required to saturate an area and pre- 
dict the likelihood and intensity of flooding. 

The specific catchment area (SCA) is 
defined as the total area draining to a point 
relative to drainage width, in raster data sets 
calculated as 


SCA AREAIC as 


‘where AREA is the accumulated surface area 
upstream from a point, and C is the raster 
cell dimension. Siream power index (SPI) is 
defined as: 


SPI = SCA * ton (b) me 


where b is the slope at a polat, and SCA is as 
defined above. SPI is used to identify the 
potential erosion at a point, which depends 
both on the upstream area and hence ability 
to accumulate water, and the local slope, 
which drives the erosive energy in water 
flow. 

Perhaps the most commonly applied 
wetness index is calculated by: 


qu 


‘where w is the wetness index at a cell, ca is 
the specific catchment area, and В is the 
slope at the сей. This index has been shown. 
to effectively represent the increased soil 
wetness due to large upslope areas and low 
slopes, particularly when combined with 
plan curvature and profile curvature mea- 
surements. These factors sort terrain along 
ridge-to-siream and convex-to-concave gra- 
dients. 

‘There are many other topographic indi- 
ces, e.g., for estimating total solar radiation, 
surface air drainage, or surface roughness. 
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These and others are described in the refer- 
‘ences at the end of this chapter. 


Contour Lines 


Contour lines, or topographic contours, 
are connected lines of uniform elevation that 
таа at right angles to the local slope. Contour 
lines are a common feature on many map 
series; for example, they are depicted on the 
USGS 1:24,000 scale nationwide series, and 
Britain's 1:50.000 Ordinance Survey maps. 
The shape and density of contour lines pro- 
vide detailed information on terrain height 
and shape in а two-dimensional map, with- 
‘out the need for continuous tone shading. 
Both color and continuous tone printing 
were important limitations for past cartogra- 
phers. Contour lines could be easily drawn 
with simple drafting tools. Although contin- 
uous tone printing is much less expensive 
today, contours will remain common as they 
have entered the culture of map making and 
map reading. 

‘Several rapid, efficient methods have 
been developed for calculating contours, 
either from points or from grid data (Figure 
11-23), Early contour maps and РЕМ» were. 


Contour placement 
Point with height 
„АК T 
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A contour passes through a height value 
He at a point on the straight tine be- 
tween known points with heights 

Hb. Ht (see above). Here, we ensure. 
Hb НЕ The point is at a calculated 
distance dz, as shown in the diagram 
above, according to the formula: 


Figure 11-23: Contour line locations are often 
‘sted irom poit eit locaton a а ner 
distanse differenser 
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developed from height measurements at a set 
of points. While useful, these points did not 
provide clear depictions of elevation. Con- 
tour lines of fixed values were interpolated 
linearly between nearest measurement 
points, as shown in Figure 11-23. Later mea- 
‘surement methods either identified contour 
lines directly from stereopairs (see Chapter 
6). or derived them from mechanically or 
electronically produced rasters. Raster to 
contour generation also typically follows a 


linear interpolation. For a raster, appropriate 
adjacent cell centers are selected, and con- 
tour values interpolated as illustrated in Fig- 
ure 11-23. 

Contour lines are typically created at 
fixed height intervals, for example, every 30 


ım (100 fi from a base height (Figure 1124) 
Because each line represents a fixed eleva- 
tion above or below adjacent lines, the den- 
sity of contour lines indicates terrain 

steepness, Point A in Figure 11-24 falls in a 


flat area (the foreground of the photo, at bot- 
om), where elevation does not change 
much, and there are few contour lines. Steep 
areas and cliffs are depicted by an increase 
їп contour density, as shown at point В, with 
changes in steepness depicted by changes in 
density (above and below point С). Peaks, 
such as the top of Washington's column, D. 
and North Dome, Е, appear as concentric 

ings. Note that contours may succinctly rep- 
resent complex terrain structures, such as the 
curving arches in the center of the photo- 
graph. and shown below point C, and the 
overhanging cliff, to the left of point F. 


Profile Plots 


Profile plots are another common deriv- 
ative of elevation data. These plots sample 
elevation along a linear profile parh, and dis- 
play elevation against distance in a graph 
(Figure 11-25), Elevation is typically plotted 
‘on the у axis, and horizontal distance оп the 
x axis These profile plots are belpful in visu- 
alizing elevation change, slope, and cumula- 
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tive travel distance along the specific profile 
path. Profile plots are common on the edges 
‘of maps, particularly maps of off-road, bicy- 
Че, or cross-country routes. 

Profile plots often have some level of 
vertical exaggeration because horizontal dis- 
tances are usually much larger than elevation 
gain. Vertical exaggeration isa scaling factor 
applied tothe elevation data when shown on 
the graph. For example, Figure 11-25 shows 
a square graph that depicts approximately 31 
km across the Earth's surface. The vertical 
‘elevation axis spans approximately 2.5 km 
‘over the same dimensions on the graph. This 
isa vertical exaggeration of approximately 
12 (from 312.5). 


Elevation Profile 


Distance Along Profile (meters) 
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Viewsheds 


The viewshed for a point is the collec- 
ion of areas visible from that point. Views 
from many locations are blocked by terrain. 
Elevations will hide points if the elevations. 
are higher than the line of sight between the 
‘viewing point and target point (Figure 11- 
26). 

Viewsheds and visibility analyses are 
‘quite important in many instances. High- 
voltage power lines or cell towers are often 
placed after careful consideration of their 
Visibility, because most people are averse to 
viewing them. Communications antennas, 
large industrial complexes, and roads аге 
often located at least partly based on their 
visibility, and viewsheds are specifically 
managed for many parks and scenic areas. 

A viewshed is calculated based on cell- 
To-cellintervisibility. А line of sight is drawn. 
between the view cell and a potentially visi- 
ble target cel (Figure 11-26). The elevation. 
of this line of sight is calculated for every 
intervening cell Lf the slope to a target cell is 


— visible terrain 


— hidden terrain f 


lines of sight. =) 
, 


JE venont 


— 
m ب‎ 


less than the slope to a се closer to the 
viewpoint along the line of sight, then the 
target cell is not visible from the viewpoint 
Specialized algorithms have been developed 
to substantially reduce the time required to 
calculate viewsbeds, but in concept, lines of 
sight are drawn from each viewpoint to each 
cell ia the digital elevation data. If there is 
no intervening terrain, the cell is classified 
as visible, The classification identifies areas 
that are visible and areas that are hidden 
(Figure 11-27). Viewsheds for line or area 
features are the accumulated view sheds from 
all the cells in those features. 


viewshed | 


1—16 areas 


viewpoint 


Figure 11.27: An example of a viewpoint. and corresponding viewshed. 


‘Shaded Relief Maps 


A shaded relief map. also often referred 

to as a hillshade map. is a depiction of the 
brightness of terrain reflections given a ter- 
rain surface and sun location. Although. 
shaded relief maps are rarely used in analy- 
ses, they are among the most effective ways 
то communicate the shape and structure of 
terrain features, and many maps include 
relief shading (Figure 11-28). 

Shaded relief maps аге developed from 
digital elevation data and models of light 
reflectance. An artificial sun is "positioned" 
ata location in the sky and light rays pro~ 
jected onto the surface depicted by the eleva- 
tion data. Light is modeled that strikes a 
surface ether as а direct beam, from the sun 
to the surface, or from background "diffuse" 
sunlight. Diffuse light is scattered by the 
atmosphere, and illuminates "shaded" areas, 
although the illumination is typically much 
less than that from direct beam. 


The brightness of a cell depends on the 
local incidence angle, the angle between the 
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incoming light ray and the surface normal, 
shown as Ө ш Figure 11-29. The surface nor- 
mal is defined as a line perpendicular to the. 
local surface. Direct beam sunlight striking 


Figure 11-28: Relief shading is ойе added а de fer mapped data to provide seme of 
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the surface at a right angle (0 = 0) provides 
the brightest return, and hence appears light 
‘ASO increases, the angle between the direct 
‘beam and the ground surface deviates from 
perpendicular, and the brightness decreases. 
Diffuse sunlight alone provides a relatively 
‘weak rerum, and hence appears dark. Com- 
binations of direct and diffuse ight result m 
a range of gray shades, and this range 
depends on the terrain slope and angle rela- 
tive to the sun's locaton. Hence, subile vari- 
ations in terrain are visible on shaded relief 
maps. 

Calculating a shaded relief surface 
requires specifying the sun's position, usu- 
ally via the solar zenith angle, measured 
from vertical down to the sun's location, and 
"he solar azimuth angle, measured from 
north clockwise to the sun's position (Figure 
11-30), Local slope and surface azimuth 
define a surface normal direction. An angle 
may be defined between the solar direction. 
and the surface normal direction, shown as 0 
in Figure 11-30. As noted earlier, the amount 
of reflected energy decreases as Ө increases, 
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and this may be shown as various shades of 
grey in  hillshade surface. 

A shaded relief map also requires a cal- 
culation of visibility, often prior to calculat- 
ing the reflectances, Visibility to the sun is 
determined; ifa cellis visible from the sun, 
the slope and aspect values are used to 
assign the cell brightness. 


Terrain Analysis Software 

Terrain analysis and DEM data manage- 
ment and analysis are important enough to 
be included in most general-purpose GIS 
packages, including ArcGIS, GRASS, 
ERDAS, Idrisi, and Manifold, While they 
support the most common set of terrain and 
hydrologic analyses, none of these packages 
includes the broadest range of terrain pro- 
cessing and analysis functions. Specialized 
analyses are often performed using software 
with a specific focus on terrain analysis. 
These include the Whitebox GAT, from the 
University of Guelph, and commercial tools, 
such asthe Watershed Modeling System 
(WMS) by the Scientific Software Group. 

Whitebox GAT contains what is likely 
the most comprehensive set of terrain analy- 
sis functions ш а freely available package 
‘Support is particularly strong for hydrolog 
surface and stream link processing and anal- 
ysis, with functions for calculating various 
flow direction, accumulation and watershed 
delineation methods typically not supported 
by other packages. Basic terrain modifica- 
tion, LIDAR data input and processing, and 
general raster GIS functions are also sup- 
ported. 

LandSer is a package with particularly 
strong support for terrain shape and geomor- 
phological analysis, in addition to a strong 
focus on surface visualization. Multiple 
methods of calculating and combining first- 
and second-order terrain gradients are sup- 
ported, as well as basic elevation data con- 
Version and LandSerf is writen 
in Java, and hence available across the wid- 
est range of operating systems 


 Arcliydro is a set of hydrologic analysis 
tools written as an extension to ArcGIS. It 
supports а fairly complete set of hydrologic 
and watershed delineation functions. 

‘There are many other packages avail- 
able, including SAGA, TAUDEM, Surfer, 
ТАР and MicroDEM, which provide various 
specialized capabilites, and may be worth 
investigating for users interested in terrain 
and hydrologic analysis. 


‘Summary 

‘Terrain analyses are commonly per- 
formed within the framework of a GIS. 
These analyses are important because terrain 
governs where and how much water will 
accumulate on the landscape, how much 
sunlight a site receives, and the visibility of 
human activities. 

Slope and aspect are two of the most 
used terrain variables, Both are commonly 
calculated via trigonometric functions. 
applied in a moving window to а raster 
DEM. Several kernels have been developed 
to calculate changes of elevation in x and y 
directions, and these component gradients 
are combined to calculate slope and aspect. 
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Profile curvature and plan curvature are. 
two other important terrain analysis fanc- 
tions. These functions measure the relative 
‘convexity or concavity in the terrain, relative 
to the downslope direction for profile curva- 
ture and the cross-slope direction for plan 
curvature. 

‘Terrain analyses are also used to develop 
and apply hydrologic functions and models. 
Watershed boundaries, flow directions, 

and drainage networks may ай be 
defined from digital elevation data, 

‘Viewsheds are another commonly 
applied terrain analysis function. Intervisi- 
bility may be computed from any location on 
ıa DEM. A line of sight may be drawn from 
‘any point to any other point, and if there is 
no intervening terrain, then the two points 
are imervisible. Viewsheds are often used to 
analyze the visibility of landscape alterations 
‘or additions, for example, when siting new 
roads, powerlines, or large buildings. 

Finally relief shading is another com- 
mon use of terrain data. A shaded relief map 
is among the most effective ways to depict 
terrain. Terrain shading is often derived from 
DEMS and depicted on maps. 
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Study Questions 


11.1 - What are digital elevation models, and why are they used so often in spatial 


11.2- How are digital elevation data created? 


4.3 Write the definition of slope and aspect, and the mathematical formulas used to 
derive them from digital elevation data. 


11,4 - Calculate 42/dx and dZ/dy for the following 3 x 3 windows. Elevations and the 
cell dimension are in meters. 


windows 


HEE 


Dm 


of [asp 


4-nearest cell 
dZ/dx « 
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3rd-order finite difference 
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11.5 - Calculate dZ/dx and dZ/dy for the following 3 x 3 windows. Elevations and the 
cell dimension are in meters. 


windows — 4-nearest cell 3rd-order finite difference 
eos nz[us| 92/8 mute 
REE 
EES a2 Mey = 
Ув ТезТег] вшлы- а 
Jug 
Ten |72] аллау. алу = 
is fio ie dZ/d = dZ/dx = 
19 [zo [19 
21 [22 [20] се. алу « 


11.6- Calculate the slope and aspect for the underlined cell values, using the four 
nearest cell method. 


a [oo es ro 
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11.7 - Calculate the slope and aspect for the underlined cell values, using the four 
пенен сей method 


11.9 - Calculate the slope and aspect for the underlined cell values, using the third- 
andes finite difference method. e 


"lot Lr a 
MU: Fix pet ot nt (x alee ai жеши узай 


11.11 - What is an elevation contour? 
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11.12- Draw the approximate location of contours for the following set of points. 
Start contours at the 960 value and use a 30 unit contour interval. For this exercise, it 
is permissible to estimate the contour locations visually: you do not have to calculate 
the distances between points to place the contour lines. 


m E jm 
juo 
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11.13 - Draw the approximate location of contours for the following set of points. 
‘Start contours at the 0 value and use а 200 unit contour interval. For this exercise 
permissible to estimate the contour locations visually; you do not have to calculate. 
the distances between points to place the contour lines. 


is 


J мз 
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Que 
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que à) 
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дю po 
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40.14 What is the formula to calculate contour height from two measured eleva- 
tions? 
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11.15- Using the figure below, calculate the distances to the listed contour line along 
th shortest path between poinis. The example shows the distance calculation fom 
Pout to Ше comou wk aue 250 along te ngu ine m Ato, when he 
values of A and B are shown, and the distance from A to В is 148. 


Distance from B to the 300 contour on the line B - D, when da, is 94 
Distance from E to the 250 contour on the line Е - D. when dep is 115 
Distance from C to the 200 contour on the line C - D, when d is 188 
Distance from E to the 300 contour on the line E - G, when dec is 248 


FA20 


толо 


слво" 


“оз 


11.16 - Using the figure above, answer the following: 
Distance from A to the 200 contour on the line А - C, when dac is 94 
Distance from E to the 300 contour on the line E - D. when dac is 115 
Distance from F to the 400 contour on the line - G, when deg is 178 
Distance from B to the 350 contour on the line B - F, when dar is 224 
Distance from E to the 250 contour on the line E - ©. when dec is 248 


11.17 - What are the plan curvature and profile curvature, and how do they differ? 
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11.18- Define the watershed boundaries and possible stream locations in the digital 
elevation data depicted below: 
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11.19- Define the watershed boundaries and possible stream locations in the digital 
elevation data depicted below: 


169 1 
254 224 

323/312 204 191 
267 307 
405 344 
387 304 


765 826 


767| a4 res | 761| 675) 202 165 104 


11.20 - Define the following: solar zenith angle, solar azimuth angle, and solar inci- 
dence angle. 


11.21 - Draw a diagram illustrating the solar incidence angle, and identify what site 
terrain factors affect the solar incidence angle. 


11.22 - What are viewsheds, when are they used, and how are they calculated? 


11.23 - What is a shaded relief map? How are the values for each cell of a hillshade 
surface calculated? 


1 2 Spatial Estimation: Interpolation, 
Prediction, and Core Area 


Delineation 


Introduction 


Spatial prediction methods are used to 
estimate values from known locations at 
‘unknown ones (Figure 12-1). An obvious 
question is, why estimate? Why not just 
measure the value at all locations? Predic- 
ions are required because time and money 
are limiting. Ata more basic level. there is 
эп infinite number of potential sampling 
locations for any continuous variable in any 
study are, and it is impossible to measure at 
all locations. While there is a finite number. 
of discrete objects in all studies. there are 
‘usually too many to measure them all. Prac- 
tical constraints usually limit samples to a 
subset of the possible lines, polygons, 
points, or raster cell locati 


Spatial prediction may be required for 
other reasons. Besides cost, some areas may 
be difficult or impossible to visit. A parcel 
‘owner may prohibit entry. It may be too dan- 
gerous to collect samples, for example, in 
рап of a park because lions may eat the 
sampling crew, or elephants trample them. 
Spatial prediction may be required due 
to missing or otherwise unsuitable samples. 
M it is difficult, expensive, or the wrong sea- 
son for sampling. it may be impossible io 
replace lost samples. Samples may be dis- 
covered as unreliable or suspect once th 
‘measuring crew has returned. Suspect "out- 
lier" points are often dropped from data sets. 
These now missing points may be crucial to. 


ers ines Rance 
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the analysis and, if so, the missing values 
estimated. Finally, estimates may be 
required when changing toa smaller cell size 
in a raster data set. The "sampling" fre- 
quency is set by the original raster, and val- 
ues must be estimated for the new, smaller 
celi. 


Spatial interpolation is the prediction of 
variables at unmeasured locations, and based 
‘on a sampling of the same variables at 
known locations. Most interpolation meth- 
‘ods rely on the nearest points to estimate. 
missing values, and use some measure of 
distance from known to unknown values We. 
might have measured air pollution at a set of 
towers across a region, but need estimates 
for ай locations in that region. Interpolation 
is routinely used to estimate air and water 
temperature, soil moisture, elevation, ocean 
productivity, population density, and a host 
of additional variables. 


‘Spatial prediction also involves the esti 
mation of variables at unsampled locations, 
but differs from interpolation in that esti- 
"ates are based at least in part on other vari- 
ables, and often on a total set of 
‘measurements. We may use elevation to help 
estimate temperature because it is often 
cooler at higher locations. A map of eleva- 
tions may be combined with a set of mea- 
sured temperatures to estimate temperatures. 
at unknown locations. 

А core area is characterized by high use, 
density, intensity, or probability of occur- 
тепсе fora variable or event. Core areas are. 
defined from a set of samples, and are used 
to predict the frequency or likelihood of 
‘occurrence of an object or event. Home 
ranges for individual animals, concentra- 
tions of business activity or centers of crimi- 
nal activity are all examples of core areas. 
There are several methods that may be used 


in identifying these core areas. These meth- 
ods typically draw from а set of sample 
points that constitute events, such as an 
Observation of an animal, a business loca- 
tion, ora crime that has been committed. 


Spatial prediction typically translates 
from lower spatial dimensions to the same or 
higher dimensions. This means we typically 
generate points or lines from point data, or 
areas from point, line, or area data. Predic- 
tion methods allow us to extend the informa- 
tion we have collected, most often to “fill in" 
between sampled locations, but also to 
improve the quality of the data we have col- 
lected. 


Spatial prediction methods may also be 
used to translate information from a higher 
order toa lower order, that is, to estimate 
point values from data collected or aggre- 
gated to area or lines, We may have popula- 
tion data reported for an area, and we may 
‘wish to estimate population for a specific 
point within this area. This may be affected 
by the modifiable areal unit problem, а com- 
mon hazard in spatial estimation methods 
described in Chapter 9. 


Whatever the methods used, spatial est 
mation is based on a set of samples. An indi 
vidual sample consists at least of the. 
coordinates of the sample location and a 
measurement of the variable of interest at the 
sample location, We may also measure addi- 
tional, related variables at the sample loca- 
tion. Coordinates should be measured tothe 
highest accuracy and precision that is practi- 
cal, given cost and time constraints and the 
intended use ofthe data. Sample variables 
should be measured using accurate, stan- 
dardized, and repeatable methods. 
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Sampling 


Estimation is based on a sample of 
known points. The aim is to estimate the val- 
ues fora variable at unknown locations 
based on values measured at sampled loca- 
tions. Planning will improve the quality of 
the samples, and usually leads to а more effi- 
cient and accurate interpolation. 


We control two main aspects ofthe sam- 
pling process. First, we may control the loca- 
tion of the samples. Samples must be spread 
across our working area. However, we may 
choose among different pattems in dispers- 
ing our samples. The patter we choose will 
їп tum affect the cost of our samples and the 
quality of our interpolation. А poor distribu- 
tion of sample points may increase errors or 
may be inefficient, resulting in unnecessary 
costs, 


Sample size is the second main aspect of 
the sampling process we control. One might 
believe the correct number is “as many as 
you can afford,” however, this is not always 
the case. A law of diminishing retums may 
be reached, and further samples may add rel- 
atively little information for substantially 
increased costs. Unfortunately, in most prac- 
tical applications, the available funds or time 
are the main limiting factors. Most surfaces 
are undersampled, and additional funds and 
samples would usually increase the quality 
of the interpolated surface, To date, there. 
have been relatively few studies or well- 
established guidelines for determining the 
optimum sample number for most interpola- 
tion methods. 

‘There are times when we control neither 
the distribution nor the number of sample 
points. This often occurs when we are work- 
ing with “found” variables, for example, the 
distribution of illness in a population. We 
may identify the households where a family 
member has contracted а given illness. 
Although we can control nether the number. 
nor the distribution of ill people, we may 
‘wish to use these "samples" in an interpola- 
tion procedure. 


Sampling Patterns 

There are several commonly applied. 
sampling patterns. A systemaric sampling 
‘pattern is the simplest (Figure 12-20), 
because samples are spaced uniformly at 
fixed X and Y intervals. The intervals may 
not be the same in both directions, and the X 
and Y axes are not required to align with the 
northing and easting grid directions. The 
sampling patter often appears as points 
placed systematically along parallel lines. 

Systematic sampling has an advantage 
‘over other sampling рапет in terms of ease. 
in planning and description. Field crews 
‘quickly understand how to lay out the sam- 
ple panem, and there is little subjective 
Judgement required. 

However, systenatic sampling may 
have disadvantages. It is usually not the 
most statistically efficient sampling pattem 
because ай areas receive the same sampling. 
intensity. If there is more interest or varia- 
tion in certain portions of the study area, this 
preference is not addressed by systematic. 
sampling. The difficulty and cost of travel 
ing to the sample points is not considered. It 
may be difficult or impossible to stay on line 
between sampling points. Rough terrain, 
physical barriers, or lack of legal access may 
preclude sampling at prescribed locations. 

In addition, systematic sampling may 
introduce a bias, particularly if there are pat- 
terns in the measured variable that coincide. 
withthe sampling interval. For example, 
there may be a regular succession of ridges 
and valleys associated with underlying geo- 
logic conditions. Ifthe systematic sampling. 
interval coincides with this pattern, there 
may be a bias in sample values. 

Random sampling (Figure 12-20) may 
avoid some, but not all, of the problems that 
affect systematic sampling. Random sam- 
pling entails selecting point locations based 
‘on random numbers. Typically, both the 
casting and northing coordinates are chosen 
by independent random processes. These 
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may be plotted on a map andor listed, and 
then visited with the aid of a GNSS or other 
positioning technology to collect the sample. 
Тһе points do not have to be visited in the 
onder in which they were selected, soin 
some instances, vavel distances between 
points will be quite small. On average the 
distances will be no shorter than witha sys- 
tematic sample, so travel costs are likely to 
be at least no worse than with systematic 
sampling. 

Random samples have an advantage 
over systematic samples in that they are 
unlikely to match any pattem in the land- 
scape, Hence, the chances are lower for 
biased sampling and inaccurate predictions. 


However, like systematic sampling. ran- 
dom sampling does nothing to distribute 
samples in areas of high variation. More 
samples than necessary may be collected in 
uniform areas, and fewer samples than 
needed may be collected in variable areas. In 
addition, random sampling is more compli- 
cated and hence more difficult to understand 
than systematic sampling. More training 
may be required for crews when 
ноа radon pag Rano 
sampling is seldom chosen when sampling 
over large areas, due to these disadvantages 
and relatively few advantages over alterna- 
tive sampling strategies. 

Cluster sampling is a technique that 
groups samples (Figure 12-2). Cluster cen- 
ters are chosen by some random or system- 
atic method, with a cluster of samples 
arranged around each center. The distances 
between samples within а cluster are gener- 
ally much smaller than the distances 
between cluster centers. 

Reduced travel time isthe primary 
advantage of cluster samples. Travel times 
withing a cluster are shorter. A sampling 
crew may travel several hours to reach a 
cluster center, but only a few minutes 
between each sample within a cluster. Clus- 
ter sampling is often used in natural resource 
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surveys that entail significant off-road travel 
because of the reduction in travel times. 

There are several variants of cluster 
sampling. Cluster centers may be located 
randomly or systematically, Samples within 
a cluster may also be placed at random or 
systematically around the cluster center. 
Both approaches have merit, although itis 
‘more common to locate cluster centers at 
random and distribute samples within а clus- 
ter according to some systematic patter. 
This approach is used by the U.S. Forest Ser- 
vice to conduct national surveys of forest 
conditions, and by many prospectors during 
mineral exploration. 

ve sampling is a final method we 
will deta сыкса by берей 
sampling in variable areas and sparse sam- 
pling in uniform areas (Figure 12-24), Adap- 
tive sampling greatly increases sampling 
efficiency because small-scale variation is 
better sampled. Large, relatively homoge- 
neous areas are well represented by a few 
samples, reserving more samples for areas 
with higher spatial variation. 

Adaptive sampling requires a way to 
estimate feature variation prior to field visits 
Or while in the field, or repeat visits to the 
sampling areas. Sample density is adaptively 
increased in areas of high variation. Some- 
times й is quite obvious where the variation. 
is greatest while in the field. For example, 
when measuring elevation, it is obvious. 
where the terrain із more variable. Sample 
density may be increased based on feld 
observations of steepness. 

Ifthere is no method of identifying 
where the features are most variable while in 
the field. then sample density cannot be 
increased "on the spot.” Samples may 
require office or lab for analysis to estimate 
variation. Sample locations are then selected 
based on local variation. The list or map of 
‘coordinate locations may be generated and 
used as a guide in collecting subsequent 
samples. 
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Spatial Interpolation Methods 


There are many different interpolation 
‘methods. While methods vary. all combine 
the sampled values and positions to estimate. 
values at unmeasured locations. Mathemati- 
cal functions are used that incorporate dis- 
tance between the interpolation points and 
the sample points with the values at the sam- 
ple points, Methods differ in the mathemati- 
cal functions used to weight each 
“observation, and the number of observations 
used. Some interpolators use every observa- 
tion when estimating values at unsampled 
locations, while other interpolators use a 
subset of samples, for example, the three 
points nearest an unmeasured location. 


Different interpolation methods will 
often produce different results, even when. 
using the same input data. This is due to the. 
differences in the mathematical functions 
and number of data points used when esti- 
mating values for the unsampled locations. 
Each method may have unique characteris- 
tics, and the overall accuracy of an interpola- 
tion will often depend on the method and 
samples used. 


Accuracy is often judged by the differ- 
ence between the measured and interpolated 
values at a number of withheld sample 
points. These withheld points are not used 
‘when performing the interpolation, but are 
checked against the interpolated surface. 
However, no single interpolation method has 
been shown to be more accurate than all oth- 
ers for every application. Each individual ог 
‘organization should test several sampling 
regimes and interpolation methods before 
adopting an interpolation method. 
Interpolation methods may produce one 
ог more of a number of different output 
types. Interpolation is often used to estimate 
values fora raster data layer. Other methods 
produce contour lines. Contour lines are less 
frequently produced by interpolation meth- 
‘ods, but are a common way of depicting a 
continuous surface. At least one interpola- 
tion method defines polygon boundaries. 


Interpolation to a raster surface involves. 
estimating unmeasured values at the center 
of each raster grid cell. Raster layer bound- 
aries and cell dimensions are specified, in 
tum defining the location of each raster cell 

We will describe the most common 
interpolation methods and apply them ай to 
a single data set to facilitate comparisons. 
Figure 12-3 shows sample points for ozone 
data forthe eastem United States, collected 
by various health and environmental agen- 
cies, and an index value for the 2014 year. 
Denver is to the extreme left, New England 
in the upper right, and Atlanta indicated by 
the cluster of near the lower right 
oft gue Суйе ме snl aad colored 
to reflect the 98th percentile measurement, 
in parts per billion (ppb) during daytime 
hours, a value related to injury caused by 
ozone exposure. Weather, combustion, 
chemical release, and topographic condi- 
tions can combine to create hazardous con- 
centrations, particularly for vulnerable 
populations. Since it is expensive and diffi- 
cult to make precise ozone measurements, 
the network is limited, and there is a need to 
interpolate between sampling stations. These 
sample points will be used to demonstrate 
the application of various interpolation and 
spatial prediction methods in the following 
sections of this chapter. Estimated ozone 
concentration surfaces for each method will 
beshown. 


Note that the comparisons and figures. 
are only to illustrate different interpolation 
methods. They are not to establish the rela- 
tive merit or accuracy of the various meth- 
‘ods. The best interpolation method for any 
ven application depends on the character- 
istics of the variable to be estimated, the cost 


We need an independent error measure. 
to obtain a good estimate of the interpolation. 
accuracy. Accuracy estimates may be 

obtained with a withheld sample technique, 
‘where the surface is fit to the data withhold- 
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ing one data point. The eror is estimated at 
istepelaed vee The spe iced. 
in is 

or sanie nic md liae tad e 
Surface fit and eror again determined. This 
is repeated for each data point. A less effi- 

cient testing method entails collecting an 


set of sample points that 
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Nearest Neighbor Interpolation 


Nearest neighbor interpolation, also 
known as Thiessen polygon interpolat 
assigns a value for any unsampled location 
that is equal o the value found at the nearest 
sample location. This is conceptually the 
simplest interpolation method, in the sense 
thatthe mathematical function used isthe 
‘equality function. and only one point, the. 
nearest point, is used to assign а value to an 
‘unknown location. 

The nearest neighbor interpolator 
defines a set of polygons, known as Thiessen 
polygons. All locations within a given 
‘Thiessen polygon have an identical value for 
"he Z variable (in this and other chapters, 2 
wili be used to denote the value of a variable. 
of interest at an X and Y sample location), Z 
‘may be elevation, size, production, or any 


other variable we may measure at a point. 
Thiessen polygons define a region around. 
each sampled point that have a value equal 
то the value at the nearest sampled point. 
The transition between polygon edges is 
abrupt: that is, the variable jumps from one 
‘value to the next across the Thiessen poly- 
gon boundary. 

The three-dimensional perspective rep- 
resentation ofan interpolated nearest neigh- 
bor surface illustrates some characteristics of 
‘output surfaces (Figure 12-4). Heights in the 
figure correspond to the input values at the 
points, The polygon has a uniform value that 
corresponds to the input sample value. Poly- 
gons are of irregular size, and values change 
abruptly along the polygon edges. 
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Figure 12.5 Sample рош and estimated orone coocetzation by Thiesen polygons 


Figure 12-5 shows our ozone sample 
points and Thiessen polygons based on the 
sample points. Note that sampling is denser 
near some urban areas, particularly the Phil- 
adelphia-New York City corridor in the 
upper right, Denver on the left, and St. 
Louis, Dallas, and Atlanta along the mid to 
lower portions of the figure. Thiessen poly- 
gons are smaller where sampling density is 
highest 

Thiessen polygons provide an exacr 
interpolator. This means the interpolated 
surface equals the sampled values at each 
sample point. The value for each sample 
location is preserved, so there is no differ- 
ence between the true and interpolated val- 
ues at the sample points, Exact 
have this admirable quality, but often are not 
the best interpolators at unsampled points: 
for example. the Thiessen polygon method is 
usually in error at nonsampled locations, 
often more so than other inexact interpola- 
tors 


Fixed Radius - Local Averaging 


Fised radius interpolation is more com- 
plex than nearest neighbor interpolation, but 
less complex than most other interpolation 
methods, In a fixed radius interpolation, a 
raster grid is specified in a region of interest. 
Cell values are estimated based on the aver- 
age of nearby samples. 

‘The samples used to calculate a cell 
value depend on a search radius, The search 
radius defines the size of a circle that is cen- 
tered on each cell. Sample points found 
inside the circle are averaged to interpolate 
the value for that сей (Figure 12-6). Points 
outside the circle are ignored, 

Figure 12-7 shows a perspective view of 
fixed radius sampling. Note that there is a 
sample data layer. shown at the top of Figure 
12-7. vertically aligned with the interpolated 
surface. This surface isa raster data layer 
with interpolated values in each raster cell. 
А fixed radius circle is centered over a raster 
‘ell. The average is calculated for all sam- 
ples contained within the sample circle, and 
this average is placed in the appropriate out- 
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put raster сей. The process is repeated for 
each raster cell in the surface. The fixed 
radius circles are shown corresponding to 
three raster cells, containing three, zero, and 
one sample points, respectively. Circles may 
contain по points, in Which case a zero or no 
data value is placed in the raster cell. The 
radius for the circle is typically much larger. 
than the raster cell's width. This means cir- 
cles overlap for adjoining cells, causing 
neighboring cell values to be similar. 

‘The fixed radius interpolator tends to 
smooth the sample data (Figure 12-8). Large 
or small values sampled at a given point are 
‘maintained when only that one sample point 
falls within a search radius for a cell. Values 
ге brought toward the overall sample mean 
‘when averaged within a search radius. 


The search radius affects the values of 
the interpolated surface. Too small a search 
radius results in many empty cells, with no 
data or mill values. Too large a search radius 


pod fixed radius interpolation. Note t 


over deme 


may smooth the data too much. In the 
‘extreme case, a search radius may be defined 
that includes all sample points for all cells, 
resulting in а single interpolated value for all 
cells. Some intermediate search radius is 
chosen. 


Fixed radius interpolators are not exact 
interpolators because they may average sev- 
‘eral points in the vicinity of a sample, and so 
they are unlikely to place the measured value 
at sample points in the interpolated surface. 


Inverse Distance Weighted Inter- 
polation 

The inverse distance weighted (IDW) 
interpolator estimates the value at unknown 
points using the sampled values and distance 
to nearby known points. The weight of each 
sample point is an inverse proportion to the. 
distance, thus the name. The farther away 
the poit, the less weight the point has in 
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helping define the value at an unsampled 
location. Values are estimated by: 


aan 


where 2, is tbe estimated value for the 
unknown point at location j, d, is the dis- 
tance from known point to unknown point j, 
Z, is the value for the known point |, and n is 
‘ user-defined exponent. Any number of 

TU PO Reales мумкн ее 
all points in the sample. Typically, some 
Em ui 
‘example, the three nearest sampled points 
will be used to estimate values at unknown 
locations Note that n controls how fast a 
рош» influence wanes with distance. The 
langer the n, the smaller the weight (1/4). 
зо the less influence a point has on the esti- 
mate of the unknown point. 


Figure 12-9 illustrates an IDW interpo- 
lation calculation. The three nearest samples 
are used. Each measured sample value is 


kn pont 


gen pont 
x 30 


Y Ayl 
TRI 
mm 
A -xa 
cr pant 


Figure 12-9: An example calculation for a korar 
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weighted by the inverse of the distance from 
the unknown, interpolated location. These 
weighted values are added, The result is 
divided by the sum of the weights to “scale” 
the weights to the measurement units. This 
produces an estimate forthe unsampled 
location. 


IDW is an exact interpolator. Interpo- 
lated values are equal to the sampled values 
at each sampled point. Аза а, becomes very 
small (sample points near the interpolated 
location) the 1/4, becomes very large. The 
contribution from the nearby sample point 
dwarfs the contributions from all other 
points. The values i are very near zero for 
all i values except the one very near the sam- 
pd рош. soe уйын ий other points 
are effectively multiplied by zero in the 
‘numerator of the IDW equation. The sum in 
the denominator reduces to the weight 1/4. 
The weights on the top and the bottom of 
IDW equation become more similar, and the 
fraction approaches 1, Thus, at a sampled 
point, the IDW interpolation formula. 
reduces to: 


22) 


Ву simple division this is reduced mathe- 
matically to Z, the value measured at the 
sampling location. 

Inverse distance weighting results in 
smooth interpolated surfaces (Figure 12-10). 
The valves do not jump discontinuously at 
edges, as occurs with Thiessen polygons, 
and sometimes with fixed radius interpola- 
tion. While IDW is easily and widely 
applied care must be taken in evaluating the 
values of n and i. The effects of changing n 
and í should be tested in an oversampled 
case or using retention and repeat fiting 
methods, described later, where adequate 
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Withheld points сап be compared to interpo- 
lated points. The IDW, and all other interpo- 
lator, should be applied only after the user 
is convinced the method provides estimates. 
‘with sufficient accuracy. In the case of IDW. 
this may involve testing the interpolator over 
a range of n and i values, and selecting the 
combination that most often gives accept- 
able results. 


The size ofthe user-defined exponent, n. 
affects the shape of the interpolated surface 
(Figure 12-10). When a larger n Б specified, 
the closer points become more influential 
Higher exponents result in surfaces with 
higher peaks, lower valleys, and steeper gra- 
dients near the sampled points. Contours 
‘become much more concentrated near sam- 
ple points when n » 2 (Figure 12-10,top) 
than when n » 3 (Figure 12-10, botom). 
These changing shades reflect steeper gradi- 
‘ents near the known data points. 

‘The number of points, i used to estimate 
an interpolated point, j, also affects the esti 
‘mated surface, but effects are often complex 
and difficult to generalize, because they 
depend on the distribution and magnitudes 
of the specific sample points. A larger num- 
ber of sample points tends to result in a 
‘smoother interpolated surface. 


Splines 


A spline is a flexible ruler that was com- 
monly used by drafting technicians to create 
‘smooth curves through a set of points. Math- 
ematical spline functions, also referred to as 
splines, are used to interpolate along a 
smooth curve. These functions serve the 
Same purpose as the flexible ruler in that 
they force a smooth line to pass through a 
desired set of points. Spline functions are 
‘more adaptable than their physical counter- 
parts because they may be used for lines or 
surfaces and they may be estimated and 
changed rapidly. The sample points are 
“guides” through which the spline passes. 


Spline functions are constructed from a 
set of joined polynomial functions. Line 
functions will be described here, but the 
Iso apply to surface splines. 
functions are fit to short seg- 
mens An exact ora least squares method 
may be used to fit the lines through the 
points found in the segment, For example, 
third-order may be fit toa 
segment (Figure 12-11). А different third 

pol lb to ent line 
segment. These polynomials are by their 
nature smooth curves within a given seg- 
ment. 

Splines are typically first, second, or 
thid order corresponding o le maximum 
exponent in the equation used to fit each seg- 
ment (e.g. second order for x2, third order 
forx3 or xy), Segments meet it nos or 
Join points. These join points may fall on a 
Sampled poit o they тау fall between 
sampled points. 
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Figure 12-1: Diagram of a two-dimensional 
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‘Constraints are set on spline functions to 
ensure the entire line or surface remains 
smooth at the join points. These constraints. 
are incorporated into the mathematical form 
of the function for each segment. They 
require that the slope of the lines and the 
change in slope of the lines be equal across 
‘segments on either side of the join point. 
‘Typically, spline functions give exact iner- 
polation (the splines pass through the sample 


Figure 12-12: Арба б surface. 
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points) and show a smooth transition (Figure 
12-12). Strictly enforcing exact interpolation 
‘can sometimes lead to artifacts at the knots 
Or beween points. Large loops or deviations 
may occur. The spline functions are often 

modified to allow some error in the fit, par- 
ticularly when fitting surfaces rather than 

lines. This usually removes the artifacts of 
spline fits, while maintaining the smooth and 
‘continuous interpolated lines or surfaces. 


as 


al 
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Spatial Prediction 

Spatial predictions are based on mathe- 
matical models, often built via a statistical 
process. These statistically based models 
use coordinate location and measured ог 
observed independent variables to predict 
values for important but unknown depen- 
dent variables. Spatial prediction is differ- 
ent from interpolation because it uses a 
statistical fitting process rather than a pre- 
defined algorithm. and because spatial pre- 
diction uses independent variables as well 
аз coordinate locations to estimate 
‘unknown variables. We admit that our dis- 
tinction between spatial prediction and 
terpolation is artificial, but it із useful in 
‘organizing our discussion, and highlights 
an important distinction between our data- 
driven models and our fixed interpolation 
methods, 

Spatial predictions area special case of 
general predictive modeling, a major focus 
of applied statistics. There is a rich litera- 
ture devoted to spatial statistics in general, 
and spatial predictive modeling in particu- 
lar. We will only scratch the surface of this 
field; the reader is referred to the introduc- 
tory spatial statistics texts listed at the end 
of this chapter. 

Our discussions will be restricted to 
predicting continuous spatial variables. 
‘These variables are conceptualized as spa- 
tal fields that occur across an area, are 
measured on an interval/rtio scale, and 
typically have values that vary in concert 
— thatis, they are spatially correlated. This 
is in contrast to discrete objects, such as 
point, line, or polygon features. While the 
‘occurence and properties of discrete fea- 
fures may be predicted using spatial mod- 
els, this is less common. and most discrete 
object predictions use a different set of 
tools that will not be discussed here. 

Spatial prediction may be considered 
‘more general than interpolation. Both are 
used to estimate values of a target variable 
at unknown locations. Interpolation meth- 
‘ods use only the measured target variable 


and sample coordinates to estimate the tar- 
get variables at unknown locations, while 
spatial prediction usually incorporates 
additional variables. 

Spatial predictions are often improved 
due to spatial autocorrelation, which isis 
the tendency of nearby objects to vary in 
concert. High values occur together, as do 
low values. Explanations of this common. 
condition often refer to the observation of 
‘Waldo Tobler, that “everything in the uni- 
verse is related to everything else, but closer 
things are more related.” However, the 
mature of the correlation may change from 
опе variable to the next, or it may change in 
space. Correlations may be strong in one 
region but poor in another, or positive in one 
area and negative in another, We may 
improve our predictions if we study the spa- 
tial autocorrelation and incorporate the cor- 
‘elation structure into our models. 


In addition to spatial autocorrelation, 
there may be eross-correlation between dif- 
ferent variables: the tendency for two vari- 
ables to change in concert. This means two 
different variables at the same or nearby 
locations may be high or low together (posi- 
tive cross-correltion), or highs in one vari- 
able correspond to lows in another (negative 
cross-correlation) Spatial prediction meth- 
‘ods may incorporate auto- and cross-correla- 
tion in predictions. 


Surfaces with low and high spatial auto- 
correlation and with strong cross-correlation 
are shown in Figure 12-13. Figure 12-130 
shows two surfaces, Loyer 1, with a high 
autocorrelation, and Loyer 2, with a low 
autocorrelation. Scatter diagrams of sample 
pairs separated by a uniform. short lag dis- 
tance are shown to the right of each corre- 
sponding layer. Higher autocorrelation, as 
shown in Loyer 1, indicates that points near 
each other are alike. A sample from a sur- 
face with high autocorrelation provides sub- 
stantial information about the values at 
nearby locations (Figure 12-130, top). Sam- 
ples from a surface with low autocorrelation 
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do not provide mach information at values in. 
the vicinity of the sample point (Figure 12- 
13a, bottom). 

Two cross-correlated raster layers are 
shown in Figure 12-130, Positive cross-cor- 
related layers have values that tend to both 
be high in some regions and be low in other 
regions, Many features are positively cor- 
related, such as housing prices and average 
income, ог donut shop density and number 
of security guards. Negative cross-correla- 
tion occurs when variables change in the 
‘opposite sense — areas with high values for 
‘one variable are low for the other, for exam- 
ple, low temperatures at higher elevations. 

The Moran's I statistic із an established 
measure of global, or average level of cor- 
‘elation across a data set: 


nEX«u-22-2 
TED 
^ "on 
E-Ð 


“л iun 


020) 


where д and 2, are the variable values at 
points | and, respectively: 2 with a bar 
above it is the variable mean. The calcula- 
tions standardize individual observations. 
by subtracting the mean so that the Moran's 
T range is constrained, and to ease interpre- 
tation. 

The wy are weight values that decline 
with distance, taking a positive value #2, 
and 2 are considered neighbors, and 0 if the 
values are not. Distance declines can be. 
specified in several ways, with a specific. 
distance threshold, for direct adjacency such 
a5 requiring shared edges for polygons ог 
raster cells, or using the rook'sor queen's 
case adjacencies described in moving win- 
dow analysis in Chapter 10. Weights are typ- 
ically zero forall other, more distant features 
‘or cells relative to any cell or polygon. 


Figure 12-14 shows a Moran's I caleu- 
lation applied to a raster data set. Here a 
“rook’s case” rule is applied in that neigh- 
bors are only cells that share a fll edge 

A rather sparse weight table may be 
constructed to organize the wy (Figure 12- 
13) Each row is derived from the rook’s 
adjacencies centered on а given cell, identi- 
fied in the first column of the table. There is 
a non-zero entry in a row for each rook's 
case adjacency. and zeros otherwise. For 
example. the rook's template centered on 
cell d in Figure 12-14 yields o, e, and gas 
neighbors to d. so these columns have non- 
Zero entries in row d in the weights table. 
Each row is normalized so that the summed 
row entries equal 1. yielding the sum of all 
‘weights to equal the number of features, and 
since wy = п. they cancel each other by divi- 
sion in фе Moran's I formula. 

Moran's I approaches a value of in 
datasets with positive spatial correlation, in 
‘which like values tend to occur together. The 
‘numerator in the equation i the cross prod- 
vet of features near each other. Each time à 
large valve occurs near another large vale, 
the numerator calculation will be large If 
small values also co-oceur, they will both be 
large negative numbers, and muli 
will also result in a large positive number. 
The sum of the cross product will then be 
large for spatially correlated data sets, but 
standardized by the denominator to be near 
rg 


If there is no spatial sorting in a data set, 
large values are just as likely to be near 
small Values as elsewhere, with positive val- 
ues tending to balance negative values, mak- 
ing the sum in the Moran's I numerator near 
zero. The spatial layers have low spatial cor- 
relation because knowing a value ata loca- 
tion does not provide much information 
about values in adjacent locations — they 
are just as likely to be different or similar to 
the observed value. 

Moran's approaches - when values are 


anticorrelated — a large value is more likely 
to be next to small values than next to other 


large values. 
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Moran's | Calculations 
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Note that calculation using different 
software may calculate different values for 
Moran's I when using the same data. This 
is largely due to how the weight matrix is 
constructed, and if and how row normaliza- 
tion is implemented. Different weights will 
give а different answer, such that subtle. 
difference in creation will give slightly to 
largely different numbers. Analysts should 
be careful in ce across softwares 
unless they can control all aspects of the 
calculations, particularly the weight matrix. 


‘The Moran's Lis termed a global statis- 
tic because it identifies the average or area- 
wide spatial correlation, It doesn’t identify 
how correlation varies across the area of 
interest A local Moran's 1, or LISA (local 

indicator of spatial autocorrelation) is often 
used to identify local clustering. Spatial 
‘correlation is mapped by calculating an 
index for each feature, and then plotting tbe 
Correlation at each feature. Under the typi- 
cal standardization, local weights sum to 1. 
and the local Moran's 1 is calculated as: 


re-n 


We must remember that the Z-bar in 
the above equation is the mean of all fea- 
tures, and Z is summed overall features in 
the dara set in the denominator of fraction. 
Weights w, are defined as zero over most of 
the data set, with nonzero values only 
“near” the local feature, and sum to one. 

Other neighborhoods may be specified, 
changing the number of neighbors and their 
‘weights foreach focal feature. A "queen s 
case” neighborhood may be adopted, with 
all adjacent cells participating and an 
equal weight or distance-dependent weight 
set that sums to one. Weights for vector 
data sets may be more complicated, but 
usually are specified by shared edges, 
nodes, or with centroids within a specified 
proximity 


There are many other global and local 
indices of spatial autocorrelation including. 
Geary's C. or the Gi of Getis and Ord 
(1992), and they perform in a manner simi- 
lar to Moran's L The indices vary in how 
they estimate the correlation and inthe spe- 
cific calculations of relatedness and separa- 
tion, These and a number of additional 
topics are quite well covered in the sug- 
gested reading at the end of this chapter. 


Spatial Regression 


Spatial regression and other statisti- 
cally based models typically use observa- 
tions of dependent variables, other 
independent variables, and sample coordi- 
nates to develop prediction equations. For 
example, we estimate temperature across a 
region using a network of temperature sta- 
tions, We may interpolate as described in 
the previous section to estimate tempera- 
tue, using only he station coordinates and 
the corresponding temperature measure- 
‘ments, However, we may note a strong. 
cooling trend with elevation, and combine 

ure measurements with elevation, 
atitude, and longitude ina statistical model 
that provides better temperature predic- 
tions. We would then use this model to esti- 
‘mate raster temperature layers forthe 


region. 

Spatial predictions are often described 
‘mathematically by a general function, such 
as 


Z, = foy, y B) (125) 


‘where Zi is the estimated output value, at the 
coordinates Xi: Yi at point i: ot are variables 
measured at point с and Û, are variables 
measured at other locations. 


Trend Surface and Simple Spa- 
Ча! Regression 


‘Trend surface prediction is a type of spa- 
tial regression that involves fitting а statisti- 
cal model, or trend surface, through the 
measured points. The surface is typically a 
polynomial in the X and Y coordinate sys- 
Tem. For example, a second-order polyno- 
mial model would be: 


2690 + OK + my- ay? + ag + ag xy (126) 


‘where Z is the value at any point X and v, and 
each a, is a coefficient estimated in a regres- 
sion model. Least squares methods, 
described in most introductory statistical 
textbooks, are used to estimate the best set of 
ор Values. The a, values are chosen to mini- 
"ize the average difference between the 
measured 2 values and the prediction sur- 
face. 
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‘There must be at least one more sample 
point than the number of estimated ap coeffi- 
cients due to statistical constraints. This isn't 
a problem for most applications. because the 
best polynomial models often have fewer 
than 10 coefficients much fewer than are 
typical sample sizes. 

‘Trend surfaces are not exact predictors 
in that the surface typically does not pass 
through the measured points. There is a dif- 
ference between the interpolated surface and 
the measurement at most locations, Trend 
surfaces are often among the most accurate 
methods when fitting smoothly varying sur- 
faces, such as mean daily temperature. Trend 
surfaces typically do not have “bull’s-eye” 
artifacts due to excessive local influence in 
inverse distance weighted interpolator, 

‘Trend surface methods often perform 
poorly when there is a highly convoluted 
surface (Figure 12-15). Ozone as shown in 
the raw observations can change rapidly 
‘over short distances, as can precipitation 
from a single summer thunderstorm, or pop- 
ulation density in à mixed-use neighbor- 
hood; this type of abrupt variation is often 
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poorly estimated with a trend surface. Even 
high-order polynomials may not be suffi- 
ciently flexible to fit these complex. convo- 
luted surfaces 


Trend surfaces may be extended to 
include independent variables that provide. 
some help in predicting the variable of inter- 
esc 


20g op «op +o аи (22) 


‘where X and У are the coordinate locations, 
and О and w are independent variables mea- 
sured at the point (x Y), and Z is the depen- 
dent variable to be predicted at the point 
(XY). The o, values are coefficients for the 
predictive equation, usually estimated 
through a least squares statistical process. 
The value Z may be predicted at any location 
Where we have values for X, Y, О, and W. 


Kriging and Co-Kriging 


Kriging is a statistically based estimator 

ial variables, It differs from the trend 

ace approach in that predictions are. 

based on regionalized variable theory, which 
includes three main components. The first 
‘component is the spatial trend, an increase ог 
decrease in a variable that depends on direc- 
tion; for example, precipitation may 
decrease towards the west, 


‘The second component describes the 
local spatial autocorrelation, that is, the ten- 
dency for points near each other to have sim- 
ilar Values. Kriging is unique and powerful 
because we use the observed change in spa- 
tial autocorrelation with distance to estimate 
values at our unknown locations. 


The third component in the prediction is 
random, stochastic variation. These three 
components are combined in a mathematical 
model to develop an estimation functi 
‘The function is then applied to the measured 
data to estimate values across the study area 


Much like IDW interpolators, weights in 
kriging are used with measured sample vari- 
ables to estimate values at unknown loca- 
tions. With kriging. the weights are chosen 
ina statistically optimal fashion, given a 
specific kriging model and assumptions 
about the trend, autocorrelation, and stochas- 
tic variation in the predicted variable. 


Kriging methods are the centerpiece of 
geostatistics, initially developed in the early 
1900s for use in mining. Ore samples may 
be expensive to obtain or process, and accu- 
rate mineral occurrence and density predic- 
tions difficult and valuable. Kriging 
estimators were developed to incorporate. 
trends, autocorrelation, and stochastic varia- 
tion and also provide some estimate of the 
local variance in the predicted variable. 


Kriging uses the concept of a lag dis- 
tance, often symbolized by the letter h. Con- 
‘sider the sample set shown in Figure 12-16. 
Each value for the variable Z is shown plot- 
тей over a region. Individual pots may be 
listed as 2, 2; Zy, etc. 10 Zy, when there are 
sample points. The lag distance for a pair 
of points is the distance between them, and 
by convention is denoted by h. The I 
tance is calculated from the X and Y coordi- 
mate values for the sample points, based on 
the Pythagorean formula. In our example in 
Figure 12-16. the lag (horizontal) distance 


np 
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Figure 1216 Log ане used in cleat 
тр semtvanances for ging 
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Figure 12-17: A lag tolerance defines a ninge 
for grouping "The grouping aide xt 
malin of spanaleovanance 


between the Locations of sample points 2, 
and 2; is approximately 6 units. The differ- 
ence in values measured at those points. Z; ~ 
Za, is equal to 11. Each pair of sample points 
is Separated by а distance, and also as a dif- 
ference in the values measured at the points. 
For example, 2, is 24 units from 24. and Z, 
is 5 units from z. Each pair has a given dif 
ference in the Z values; for example, Z, 
minus 2¢ is 4. Every possible set of pairs Zo, 
Za, defines a distance hap, and is different BY 
the amount Z - 2, The distance hay is 
known as the lag distance between points o 
and b, and in general there is a subset of 
points in a sample set that are a given lag. 
distance apart. 

Lag distances often are applied with an 
associated lag tolerance. lag tolerance. 
defines a small range that is "close enough" 
to a lag distance (Figure 12-17). lag toler- 
ance is required because the individual lag 
distances typically are not repeated in the 
sample data. Most or all distances between 
sample points are unique, so there is little or 
по replication with which to calculate the 
variability at each lag. Some distances may 
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be quite similar, but distances usually will 
differ in the smallest decimal places. А lag 
tolerance circumvents this problem. 


Thelag tolerance defines when dis- 
tances are similar enough to be grouped in 
spatial covariance calculations. For example, 
‘we may wish to calculate the semivariance 
for points that are 112 meters apart. If we are 
inflexible and only use point pairs that are 
‘exactly 112 meters apart (within the preci- 
sion of our measurement system), we may 
have only a few, or perhaps even no points 
that meet this strict criterion. By allowing а 
tolerance, distances that are plus or minus 
that tolerance from the given lag distance 
саа be used to calculate a spatial variability. 
For example, we might se a tolerance for 
‘of 10 units. Any pair of points between 102 
and 122 units apart are used to calculate an 
index of spatial covariance for the lag dis- 
tance h « 112. 

Geostatistical prediction uses the key 
concept of a semivariance to represent spa- 
tial covariance. А semivariance is the vari- 
ance based on nearby samples, and it is 
defined mathematically as: 


Wum Eazy) (em 


where Z, is the variable measured at one 
point, Zy is the variable measured at another 
point» distance away, and n is the number of 
pairs that are approximately the distance h 
apart. 

The semivariance ata given 
isa measure of spatial autocorrelation at that 
distance. Note that when nearby points 
(Small ^) are similar, the difference Z, - 2) 
is small, and so the semivariance is small. 
High spatial autocorrelation means points 
near each other have similar 2 values. 

The semivariance may be calculated for 
any ^. For example, when n1, the semivari- 
ance п) may be equal to 0.3: when he2. 
then y^) may be 0.5: when r3, then п) 
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may be 0.8. We may calculate a semivari- 
ance provided there are sufficient point pairs 
"hat are h distance apart to give a good esti- 
mate. 


‘We may plot the semivariance over a 
range of lag distances (Figure 12-18), and 
this plot is known as a variogram or semi- 
variogram (note that there is ongoing debate 
over which of these terms is best used). A 
variogram summarizes the spatial autocor- 
relation of a variable, Note that the semivari- 
ance is usually small at small lag distances, 
and increases to a plateau as the lag distance 
h increases. This is the typical form of a var- 
iogram. The mugger is the initial semivari- 
ance when the autocorrelation typica 
highest. The nugget is shown at the left of 
the diagram in Figure 12-18, the semivari- 
ance at a lag distance of zero. This is the 
intercept of the variogram. The sil is the 
рош at which the variogram levels off. This 
is the “background” variance, and may be 
‘thought of as the inherent variation when 
there is litle autocorrelation. The range is 
the lag distance at which the sill is reached. 
The nugget, sill, and range will differ among 
spatial variables. 

A set of sample points is used to esti 
mate the shape of the variogram. First, a set 
of lag distances h, ha, һу, ete., are de 
each distance signifies a given lag distance, 


lag distance h 


Figure 12-18; Аз idealized, ih 
the nugget, sill, and range 


асн 


12.19: A variogram, a plot felted 
тш: 


plus or minus the lag tolerance. The semi- 
variance is then calculated for each lag dis- 
tance. An example is shown in Figure 12-19. 
Remember, each of these points is calculated 
бош equation 12.7 fora given lag distance, 
A line may then be fit through the set of 
‘semivariance points, and the variogram esti- 
mated. This line is sometimes called the var- 
iogram model. 

Spatial prediction is among the most 
important applications ofthe variogram 
‘model (Figure 12-20). There are many varia- 
tions and types of kriging models, but the 
simplest and most commonly applied rely on 
the variogram to estimate “optimal” weights 
for prediction. These weights are used to 
estimate values at unknown locations by: 


a= уу t29 
in 


where Q is the estimated value at an unmea- 
sured point, are weights for each sample, 
and vis the known value at sample point j 

Weights are optimal inthe sense that 
they minimize the error ina prediction, and 
they are unbiased, given a specific data set 
and model. The calculation of optimal 
‘weights requires some rather involved math- 
ematics, beyond our present scope, but is 
described in great detail in references listed 
at the end of this chapter. 


{ _ _ 
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Figure 12:20: эы values fom a 
‘Figure 12.20 Sample pointy and edited 
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Estimating each w involves а con- 
strained minimization process. A set of 
equations may be written that expresses the 
errors as the differences between our mea- 
sured values and the predicted values by a 
function of a set of unknown weights. This 
set of equations is solved under the con- 
straints that the weights sum to zero and the. 
enor variance is minimized. The solution 
involves calculating the expected values of 
covariances between points according to а 
variogram model, for example, by fiting а 
smooth relationship between the observed 
semivariogram points, as shown in Figure. 
12-19. The covariances are a function of the. 
specific lag distances observed in the sam- 
ple, and are used to solve for the optimal set 
of weights in equation 123. 

Аз stated earlier, kriging is similar to 
IDW interpolation in that а weighted aver- 
age is calculated. However, kriging uses the 
minimum variance method to calculate the 
weights, rather than applying some arbitrary 
and perhaps more imprecise weighting 
scheme as with IDW. 


Co-kriging is an extension of kriging 
that includes the measurement of a separate, 
correlated variable at the sample locations in 
addition to the variable of interest. There 
may be an easily measured secondary vari- 
able that is to some extent related to the pri 
mary variable, but that is easier or less 
expensive to measure. In many analyses, 
temperature might be a primary variable and 
elevation a secondary variable. Co-kriging 
exploits the covariance between the primary 
and secondary variables to improve our esti- 
mate of the primary variable. Co-kriging is 
similar to kriging in that а set of optimal 
weights is estimated, but co-kriging has 
weights for both the primary and secondary 
variables. 


Spatial prediction with kriging. co-krig- 
ing and other geostatistical methods can be 
а complex and nuanced process, There is a 
wide range of possible models that in part 
depend on the characteristics of the data. 
Different data characteristics indicate partic- 
‘ular modeling methods or model forms, for 
example. if there are trends in the data, or 
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directional differences in the variance. These. 
considerations are beyond the scope of our 
present discussion, and the interested reader 
is referred to more complete treatments, 
such as Isaaks and Srivastava (1989) or 
McKillup and Dyar (2010), listed under sug- 
gested reading at the end of this chapter. 

‘There are more advanced spatial predic- 
tion methods, and spatial estimation is an 
active area of research, with more complex 
techniques such as spatial Bayesian estima- 
tion and space-time models. These topics are 
‘more appropriately treated in more advanced 
courses and texts. 


Prediction Accuracy 


We often need to characterize the accu- 
тасу of our spatial estimations. This helps us 
choose the best model and place limits on 
model application. Model assessment is a 
‘well-developed field, and will not be thor- 
‘oughly reviewed here, buta few main con- 
cepts are introduced. 


Accuracy is measured at assessment 
points, locations where we know both the 
true value and the estimated values for a 
variable. We often describe a sample set with 
п points, with estimated or interpolated val- 
ues at any ith point denoted by P, and the 
true or observed value at the point denoted 
by О, Each assessment point provides an 
error estimate: 


@-ю-0, 210) 


‘There are several metrics that are com- 
‘monly used to characterize aggregate error, 
perhaps chief among them the roor mean 
squared error: 


RMSE = | > | - 


i-i 


Error values are squared to remove the sign 
effect, and then the square oot taken on the 
Sum to return to the measured unit scale, 
instead of a squared unit scale. Predictions 
either above or below the observed values 
are generally considered to be equally bad, 
and the error is averaged over all samples. 
However, squaring the errors magnifies ће 
influence of outliers, extremely large posi- 
tive or negative errors, so some argue that 
this is an overly pessimistic estimate of 
error, or at least when there ae large outli- 
ES 

The mean absolute error is an alterna- 
tive error metric, les often used but less sen- 
sitive to outliers than the RMSE. The MAE is 
defined as: 


eH ~ 


Ir substitutes the absolute value operation for 
the squaring/square root operations and so is 
less sensitive to outliers, but otherwise is 
quite similar to the RMSE. 

Another accuracy metric is the mean 
bias error. 


=] = 


MBE measures the average bias in the pre- 
dictions, the amount by which, on average, 
an estimated surface over- or underpredicts 
the true values. MBE conveys useful infor- 
mation overall but provides little informa- 
tion on the magnitude of individual errors 
and should be used in conjunction with 
RMSE. or preferably, MAE. 


Overall measures of agreement between 
an estimated and tre surface have been pro- 
posed, including Willmot’s index of agree- 
ment: 


xce-oy 


а-а-а ем 
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Primary citations ofthese and other accuracy 
metrics are provided at the end of this chap- 
ter, and in the considerable literature on 
interpolation and spatial estimation. 
Assessing the accuracy of an шегро- 
lated surface requires we collect both 
observed and predicted values at a set of 
points. In an ideal assessment, these would 
be independent of the samples we use to esti- 
mate the surface, but this is rarely possible. 
Samples are often expensive, difficult to col- 
lect, and sparse, and most interpolated sur- 
faces would benefit from additional 
sampling. If each new sample can materially 
improve our interpolation, we are hard- 
pressed to hold them in reserve for an accu- 
тасу assessment. We are tempted to use most 
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огай of our samples while interpolating, and 
leave few or none for an accuracy assess- 
ment. 

Exact interpolators are particularly vex- 
ing As you might recall, Thiessen polygons, 
inverse distance weighted, and some spline 
interpolators have zero error at ай sample 
points by definition, because they are formu- 
fated to exactly retum the observed values at 
the fitted points. One might think that we 
must bold a set of points in reserve in order. 
to get a true estimate of the interpolator 
accuracy. 

There are related techniques, known 
variously as leave-one-out, bootstrapping, or 
‘cross-validation, which address both the 
undersampling and robust accuracy estima- 
tion requirements. Bootstrapping fits the sur- 
face as many times as there are sample 
points, each time withholding one of the 
points. We fit the surface the first time, with- 
holding the frst point. We can then subtract 
the withheld measured value (0;) to the 
interpolated value (P,), and obtain one esti 
mate ofthe error. We then repeat this process 
for the rest of the sample points. For n sam- 
ples, we fit the surface n times. We can then 
‘compare the withheld point’s true value, О, 
to the fit value P, giving us n error values, e 
‘We can then apply equations 12.10 through 
12.16 to characterize the accuracy of each 
surface. 

Bootstrapping or similar accuracy esti- 
mates should be used because the RMSE 
‘estimated from the fit points often gives an 
‘optimistic estimate of accuracy, particularly 
‘when sample size is small. If not provided 
by the GIS software used in fitting, then 
sample data should be exported to a statisti- 
cally oriented surface fitting system, for 
example, the open source statistical package 
R 
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Core Area Mapping 


Core area mapping is another common. 
ind useful spatial analysis tool. A core area 
isa primary area of influence or activity for 
‘an organism, object, or resource of interest. 
Detectives may wish to map a series of bur- 
glaries to uncover clustering or pattems in 
‘occurrence. Wildlife managers may wish to 
map the home range of an endangered 
organism. or a business owner the home 
locations of her customers. 


Core area mapping typically involves 


line observations. Individual burglaries, for 
‘example, are recorded as point locations, 
perhaps tagged to the address or building 
‘where they occurred. These points may Бе 
used to define a polygon by one of several 
соге area mapping techniques. In this way, 
the core area isa higher dimensional spatial 
object (area) that is defined from a set of 
lower dimensional objects (points or lines). 
This core area represents some central or. 
important region where features occur fre- 
‘quently, in this example, burglaries. Addi- 
tional resources may be focused on this. 

Core area, such as increased patrols or sur- 
veillance. 


Core area mapping is commonly used. 
Perhaps the most frequent applications to 
date have involved analysis of pattems of 
human activity such as crime occurrence, 
sease surveillance, or business activity. In 
addition. plant and animal species densities 
are often analyzed and summarized using 
these methods, particularly when the 
organism is highly valued or endangered. 
Resource managers record organism occur- 
‘ences in the feld, perhaps using GPS ог 
other spatial positioning technologies. 
These observations may be combined and 
abundance patterns are analyzed after a 
sufficient number of observations has been. 
gathered. Core areas may be identified, and 
key habitat conditions or requirements 
inferred. These may guide management 


actions such as the protection of areas with 
a high concentration of species 
and the enhancement of other areas by add- 
ing key habitat requirements. 


Mean Center and Mean Circle 


The mean center and associated mean 
circle are perhaps the simplest and most 
obvious measure of a central location and a 
core area. The mean center i simply the. 
average X and Y coordinates of the sample 
points. Each sample point has an associated 
pair of coordinates. These may be summed 
and the average calculated, and this mean 
point identified as the center of the core area, 
‘Mean circles may be associated with the 
mean center to define a core area (Figure 12- 
21), The mean circles are defined by a radius 
measured from the mean center. The mean 
circle radius is commonly the distance to the 
farthest sample point, the average distance 
from the mean center to the set of sample 
points. or some other statistical measures 
based on the variance of the distance to sam- 
le points. These distances may be calcu- 
easily from the sample X and Y 
Coordinates, first by calculating the mean, 
and then by applying the general formulas to 


Y coordinate 


$ 5 | PB D 
X Coordinate. 


Figure 12-21: An example of a mean center nd 
розна mes cci for a set of sample 
pos 


calculate distance from sample points to the. 
mean center. The largest distance, average. 
distance, or the standard deviation of the dis- 
tance from points to the center then may be 
determined. 


Mean circles have the advantages of 
simplicity and ease of construction, but they 
assume a uniformly circular shape for the 
core area. Some measures of mean center 
may be biased by extreme points; for exam- 
ple, the maximum distance circle in Figure 
12-21. Note that the outlier near X = 15 and 
Y « 175 results ina large maximum distance 
circle. This circle contains substantial area 
‘with no points nearby, and it is probably an 
overestimation of the core area. It is not. 
clear that the mean distance or standard error. 
distance circles are better at defining a core 
area, The core areas defined by these mea- 
sures may be appropriate for some applica- 
tons bl hey sche vo мыш in er 
Some multiple of the mean distance or stan- 
dard error may be chosen based on statistical 
assumptions, or past experience. For exam- 
ple, if we assume the samples follow a ran- 
dom normal distribution, then a core area 
defined by a circle approximately 1.8 times. 
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the standard error distance should contain 
68% of the data. Previous experience may 
help: for example, one might know that in a 
particular region, 90% or more of a wolf 
‘pack core area is within 10.8 km of a mean 
center. 

1а many cases, circular core areas are 
suboptimal because many variables are 
known to exhibit nonregular shapes, and a 
circular core area is identified when using 
the mean center / mean circle methods. 
While mean circle methods are often used in 
exploratory data analyses, other methods 
have been developed to more effectively 
identify iregularly shaped core areas. 


Convex Hulls 


Convex hulls, also known as minimum 
convex polygons, are perhaps the si 
way to identify core areas with irregular 
Shapes. A convex hul is the smallest poly- 
gon created by edges (lines) that com- 
pletely enclose a set of points, and for 
Which all exterior angles between edges are 
greater than orequal to 180 degrees (Figure 
12-22), An exterior angle is measured from 


12.22: set of points (a) 
овлар 
Бин ы Н 

defines a polygon. Hulls may be char- 


iter angles are less than 180 
degrees, or convex (€), when all ete- 
Spaa ae poer tan or = 
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‘one edge or side to another through the 
region "outside" of a polygon. Squares, tri 
angles, and regular pentagons are all exam- 
ples of convex hulls, while stars and 
crosses are examples of nonconvex hull. 
While these geometric figures have regular 
shapes, most convex hulls derived from 
sampled points will not. 

Convex hulls are often considered a 
natural bounding area for a set of points. 
This assertion is accepted by most analysts 
‘when there are no outlying data points, far 
removed from the rest. When outliers are 
present, the convex hull will often be 
‘unreasonably large 


‘Convex bulls are widely used because 
they are simple to develop and interpret, 
and there is little or no subjectivity in their 
application. The shape of the convex poly- 
gon is determined solely by the arrange- 
‘ment of the sample points, and not by. 
controlling parameters that must be speci- 
fied by the human applying the method. 
They represent the irregular shapes com- 
‘mon to most sampling. 

А convex hull may be easily created 
with a "sweep" algorithm applied to a set 
of sample points (Figure 12-23). These 
are the locations of the events of interest, 
for example, observations of a rare animal 
or crime locations. An extreme point is 
identified from the set, usually the sample 
with the largest or smallest X or Y coordi- 


a) 


b) 


nate (point s in Figure 12-230). The angles 
of deflection from the current point to all 
other points are calculated, and the smallest 
Positive clockwise or counter clockwise. 
angle and corresponding point are identi- 
fied (point in Figure 12-23). This point is 
the next in the convex hull, and becomes 
the starting point for the next calculation. 
This process is repeated until the starting 
point is reached (Figure 12-23c and d). 

Convex hulls ae often considered a 
natural bounding area for a set of points. 
However, convex hulls often ignore clus- 
tering in the data. A dense cluster of points 
їп an interior region does not influence the. 
shape of the core area. We lose much of the 
information on density or frequency of 
occurrence in the interior region of the. 
bounding polygon. Algorithms defining 
‘optimum concave polygons have been 
developed. generally fitting convex hulls to 
successive subsets of bounding points, and 
discarding outlier points, or areas defined 
by the outlying points. One such method is 
described next. 


9 convex hû 
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Characteristic Hull Polygons 


Analtemative to convex hulls has been 
developed, known as characteristic hull 
polygons (CHP). A Delaunay triangulation 
is created among the sampled points, the 
same method described in Chapter 2 when 
developing a triangulated irregular net- 
work. A set of minimum spanning triangles. 
is created, and this set of triangles win- 
owed to remove a largest area or longest 
perimeter subset. Figure 12-24o shows a 
set of sample points and the resulting con- 
vex bull, while Figure 12-24 shows the 
Delaunay triangulation for the same set of 
points. In this example, the top 5% of poly- 
gons with the longest perimeter have been. 
discarded, and the remaining shaded to rep- 
resent a core area. This reduces ће influ- 
ence of disant points and allows for 
"holes" embedded within a core area, two 
advantages over convex hulls. One must 
choose whether to use area. perimeter, ог 
some other metric of size, so the resultant 
CHP size and shape depend on the thresh- 
‘old value chosen, for example, 5 vs. 10 


в) cneroctenshie hul polygons 
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largest polygons: however, the method is 
‘easy to apply and arguably provides a bet- 
ter estimate of core areas when. 

to a convex hull, particularly when outliers 
are frequent. 


Kernel Mapping 


Kernel mapping uses a set of sample 
locations to estimate a continuous density 
surface. Kemel mapping is widely applied 
because itis mathematically flexible, rela- 
tively easy to implement, may be robust to 
outliers, readily incorporates clustered 

les, can represent ire 
o areas adio набу based. 

Kemel mapping is based on a density 
distribution that is assumed for each sam- 
ple point. These density distributions are 
placed over the sample plane, one for each 
observation point, and vertically added to 
determine the composite density from the. 
‘sample. This composite density may be 
used to identify a core area, selecting the 
densest areas first. 


8 faults per кел 


о 25 50 
x = Distance from starting point (inches) 


An example will help illustrate these 
ideas and the process of kernel mapping. 
Consider samples to detect he density of 
defects in a Ше floor. Each tile is 0.5 in 
across. We count the number of defects per 
tile, beginning at one edge of the tile 
mosaic. We will show the samples col- 
lected along line, but the process and 
principles are similar in two dimensions. 

Figure 12-25 shows the results of a 
sampling along a line segment, One defect, 
ог fault, is found on а tile located 2 in from 
the star, and it is represented by a rectan- 
gle two units tall. Each fault represents а 
density of two units, because each tile is 
0.5 in across — hence 1/0.5 = 2 faultsin. 
We observe two faults at 2.5 in (four faults! 
in), one at 3.5 in, and additional observa- 
tions until our last fault observed at 12.5 in. 
Note that the density is in the form of rect- 
angles that are "stacked" two units high for 
each fault observed fora tile. 

Note two things about the density esti 
mates. First, we assume a characteristic 
shape forthe density derived from each 
observation. In Figure 12-25, we assume 
the shape of a rectangle for each observa- 
tion, with a uniform density across the tile. 


127 асп bor represents on overage density. the height of the bor + 2 for eoch 
fouit observed no Ме. where each Ме is 0 5 inches wide. Ths ges a local 
1 estimate of the density of fouit per inch of te. For example, the tie at 
x distance « 55 has four (оз. yeking on overage density there of 4 / 05, or 
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This may not be true, but in our case we are 
using a discrete sample, and so it is а valid 
approximation. In general, this shape is 
called a density distribution. This charac- 
teristic shape (density distribution) is then 
placed for each observed sample: for exam- 
ple, note that there is a rectangle placed for 
each defect we observe at а distance from 
the starting point in Figure 12-25. 

Second, note thatthe shapes (density 
distributions) are added vertically in areas 
where they coincide, as shown in Figure 
12-25. In our example, rectangles are 
stacked. With more complex, mathemati- 
cally-defined density distributions, the val- 
ues are added over each point. The 
cumulative density distribution is the sum 
of the distributions associated with each 
sample. 

Density distributions typically are not. 
squares or other geometric figures, but. 
rather symmetric shapes such as parabolas, 
Gaussian curves, or otherwise smoothly. 
varying surfaces about a center point. 
‘These shapes can be mathematically 
defined and specified for each sample 
point. For example, a general Gaussian 
Curve for one variable has the form: 
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The value h is also known as a bandwidth 
‘Parameter, and is described in the next few 
paragraphs. 

Many functional forms can be used to 
represent the kernel densities. Typically. 
these shapes are "bumps" im that they 
smoothly rise to a peak and then descend to 
near zero. Different forms of the kernel den- 
sity function may have characteristic 
shapes — how fast they reach the peak, how 
pointed the peak becomes, and how quickly 
they retum to values near zero at points more 
distant from the peak. 

‘The composite density distribution is 
created by "stacking" our individual density 
distributions from the set of observations 
(Figure 12-270 and b). Density distributions 
may be plotted for each observation; for 
‘example, two of many observations are 
shown in Figure 12-270. Each point yields а 
smooth "bump" centered on the observation. 


агт 


where x, is the sample location and с is a 
sealing constant. This is a symmetric func- 
tion about xg, meaning the function is a 

mirror image reflected across both sides of 
the point x, (Figure 12-26). Note that the 

density distribution in the figure reaches a 
peak at x, and the area under the curve is 
typically equal to one. The formula is often 
written with ^ = 1, or may be scaled by 

dividing by а value h, so that it appears as: 


carteres on 
рр 
axed 


Density value 


542 GIS Fundamentals 


ox 
Еч а) two somple points, at х = 22 and 
§ ою X = 85, plotted individual 
= density distributions, 
Fi 008. withh=22 
5 
>. ов 
E 004 
© oo. 
ol 
x go б o 8l 100 
sample location 
ox 


b) mony sample points, 
plotted individual 

and composite density 

distributions, with h=2.2 


density function 
Н 


ЕЛ a 


sample location 


с) many sample points. 
plotfed individual 
and composite density 
distributions, with hel 


density function 


O 46 80 100 
nda location 


‘When all observed points are plotted, there 
is а commensurately large number of small, 
overlapping bumps, as shown by the thin 
lines in Figure 12-27. These may then be 
summed vertically to create the cumulative 
density distribution, shown by the thick line 
in Figure 12-270. 

‘We often choose bandwidth parameters, 
symbolized by h, that define the "spread" or 
Width of the individual density distributions 
(Figure 12-27c and Figure 12-28). Perhaps 
the simplest way to understand the band- 
‘width is to think of the binning interval in 
our example in Figure 12-25. There, we 
counted tile defects for each 0.5 in tile, and 
plotted a rectangle corresponding to the 
resultant fault density. Our bandwidth was 
set at 0.5 in, We just as well could use a 
bandwidth of 1 in, counting the number of 
defects per two tiles (1 in). along our sam- 
pling line. This would give a related, but 
slightly different estimate of the density dis- 
tribution of defects along our sampling line. 
Аз shown in the right panel of Figure 12-28, 
the estimated density distribution for the first 
7 in of our sampled line is less “peaked” or 
“spikey.” Although the same sample set is 
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‘used to estimate both density distributions, 
‘each observation is spread across a broader 
interval when we choose а larger bandwidth. 

We observe the same change in the 
peakedness when we change the bandwidth 
for continuous density distributions, such as 
the Gaussian distribution shown in Figure 
12-27 and equation 12.16. A sample is plot- 
ted using a Gaussian density function for. 
each observation and a bandwidth of = 2.2 
in Figure 12-27b, Reducing the bandwidth to 
h= I narrows the shape for each individual 
Sample and results in higher, narrower, more 
peaked shapes inthe cumulative distribution 
shown in Figure 12-27c. 

Кете! mapping is generally a three-step 
process, as may be surmised from the pre- 
‘ceding discussion. First, we collect samples. 
‘and the concomitant coordinate locations, 
Second, we choose a Кете! density function 
Finally, we choose a bandwidth, n. apply the 
Кеги density distribution, and sum across 
‘each sample area to achieve our composite 
estimate of density. 


bandwidth h = 1 
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‘Mathematically this process is summa- 
rized by the equation: 


‘where ху) is the composite density distri- 
bution, n is the namber of samples, h is the 

bandwidth, and «буу is the individual den- 
sity distribution applied at each sample point 


An example of kernel density estimation. 
is shown in Figure 12-29. An individual 
sample point is shown in Figure 12-290, 
‘with a single peak corresponding to the. 
Gaussian density distribution chosen. A 
‘more complex shape with multiple peaks 
occurs when all sample points are plotted, as 
shown in Figure 12-290. Individual distribu- 
tions are summed vertically, resulting in an 
‘undulating, complex surface. This surface 
represents the density or probability of 
‘occurrence of the. variable, for 
example te density of ena ior. 
the crime density mapped across a city, ог 
"he utilization density fora wolf pack in their 
home range. 


While the choice of bandwidth affects 
‘our results, there is no uniformly best 
‘method to select the appropriate value for h. 
‘One commonly applied method is to plot 
several density surfaces, one for each of a 
^ value, and select the h that most 

closely approximates your perception of the 
best density. Insights an the distribution and 
behavior of the data set are often gained by 
analyzing densities across a range of band- 
жай values. 

Formulas exist for optimum bandwidths 
under various conditions. One method for 
calculating optimum bandwidth has been. 


proposed by Fotheringham et al. (2000) fora 
Gaussian kemel: 


where nog в the optimum bandwidth, n is 
the numberof amples, and a is the standard 
deviation parameter. unknown, but estimated 
from the sample. 

Numerous formulas exist defining opti- 
mum bandwidths. and one is faced with a 
rather different choice of selecting the cor- 


rect optimum. The motivations behind vari- 
ous optimum bandwidths are described in 
the books by Silverman (1986) and by Foth- 
eringham et al. (2000), listed at the end of 
this chapter. 


Core area delineation is a primary use. 
for estimated density distributions. As. 
expected, the identified core areas are 
dependent on the selected bandwidth. Figure 
12-30 shows vertical views of two-dimen- 
sional density distributions for optimum (a), 
below-optimum (b) and above-optimum (c) 
bandwidth. Darker shades of gray show 
higher densities, and note the narrower. 
more concentrated distributions at the lowest 
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bandwidth (b) relative to the largest band- 
width (c). These different bandwidths result 
in different core area polygons (d through f). 
Empirical tests and experience guide the 
choice of best bandwidth. 
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Time-Geographic Density Esti- 
mation 


Density estimators have been devel- 
‘oped for space utilization by moving objects, 
typically animals for home range analysis, 
although sometimes other objects. An object 
may be observed periodically through space, 
for example, when a GNSS is attached to a 
‘migrating penguin, and the postion relayed 
to а base station. These positions are often 
called control points, because they establish 
"he location of the tracked object at a fixed 
point in time. This sequence of control 
points defines a path (Figure 12-31). 
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Figure 12-31: A sequence of burton on a 
pieces a a 


While locations between observations can- 
not be precisely determined, the control 
points constrain where the penguin might 
have been, because there is an upper limit on 
how fast the bird can travel. We may estab- 
lish a maximum velocity, v, either from pre- 
‘vious observations, from the current tracking 
effort, or from theoretical limits. Time-geo- 
graphic density estimation (TGDE) com- 
bines a sequential set of control points with 
knowledge about maximum velocity to esti- 
mate spatial occurrence probabilities. 

ТӨРЕ depends on the concept ofa geo- 
ellipse between two points. If P; is the con- 
‘tol point at time i and P, the control point at 
time j, then the geoellipse 9; may be defined 
as: 


9, = (ID. P) +DUP.P)SMUI) (1221) 


where D(P. Р) is a distance between any 
point P and the control point Р, and ML is 
the maximum distance the object could pos- 
sibly travel between the successive control 
points P, and Р, ML may be estimated by: 


мо = бузлу» агг 


where ti the time of observation of control 
point j, and v is the maximum velocity for 
the object. 


Figure 12-32 illustrates a geoelipse for 
two control points, P, and Р, Note that the 
distance function need not be Euclidean dis- 
tance, but it usually is. The tracked object is 
restricted to have been within the drawn 
ellipse, provided ош estimate of v is valid. 
The size and shape of the ellipse depend on 
the distance between the successive control. 
points, the time interval between the obser- 
vations, and the maximum veloc ity possible. 
Successive points near each other relative to 
the maximum distance, given the time differ- 
ence and maximum velocity. will be 
enclosed in a nearly circular ellipse, while 
Successive points very near tbe maximum 
possible distance will be joined by a long, 
Nery narrow ellipse. 


‘Much as when using kernel density 
functions for estimating a core area, a time- 
geographic estimate of space use is a com- 
posite of many observations. Here, each pair 
of observations may be considered a density 
volume, proportional to the probability that 
the object occupied a location during the 
time interval (Figure 12-33). A uniform den- 
sity function is the simplest to understand, 
implying the object was moving at maxi- 
mum velocity between the two controlling. 
observations, but along an unknown path 
‘within the ellipse. A uniform density func- 
tion should have a volume equal to the likeli- 
hood of occupancy, as with a standard kernel 
density estimator. For simple shapes such as 
а ree-dimensional uniform probability dis- 
tribution, the volume is equal to the area 
times height. 

Figure 12-33 shows two points, Р, 
observed at time t}, and P; observed at time 
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ta. Our task isto calculate the area of the 
eliptic volume that represents the occu- 
pancy probability, given our observations. 
‘Two paths are shown between the points, 
‘one traveling distance dyp and another dr- 
These two paths have the same length, by 
the definition ofthe bounding ellipsoid, and 
they ae also each equal to the long axis 
length of the ellipsoid, 22. Geometric rela- 
tionships between the interpoint distances 
and the dimensions of an ellipse allow us to 
Calculate о and b, two characteristic dimen- 
sions, which in tum allow us to calculate the 
мез, лов. This may then be scaled by height 
To assign an occupation probability (Figure 
12-33, lower half). 

The is for over 
polt paka across elo chee cea 
points. The next two points in the sequence, 
Р; and P3, are paired, and the density ellipse 
calculated, summing the densities where 


Figure 12-33: The process of calculat- 
Tete 


‘Two рот. Р and Pa are measured. 
st lime у and ta, for an object with 
2 тартып velochy v The pont 
locations teme interval and v define 
an else 


The elipse area can be calculated, 
Win he elipse height scaled 10 
5 density volume proportional to 
the Бао of occupation 


~The subsequent pair of points 
(Pz and 53 not shown) are processed, 
элаз new volume added 10 е occur. 
Ence surface эти to hemel 
mapping 
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Figur 1234 An example of helping set (денге ato cee a density function 
dala et The tte owes oop hope 
Nri isnt par oir Re tne steal sace he Lint col poit aber 


adapted from Downs etal, 2011). 


geoellipses overlap. The process is repeated 
for points P, P,« until the last point is 


reached. Figure 12-34 illustrates the overlap- 


ping set of geoellipses from a sample set. 
Ellipses may vary in shape, depending 
оп time interval, distance between features, 
and the maximum velocity (Figure 12-34). 
Longer time intervals between observations. 
result in larger ellipses, irrespective of the 
distance between subsequent points. Аз the 
interpoint distance approaches the maximum 
set by the maximum velocity. v, the ellipses 
‘become longer and narrower, and reduce toa 
line when the points are spaced at the maxi- 
‘mum possible distance. Conversely. the 
ellipses approach circles when the time 
interval between points is long but the 
observed distance between points is small. 
‘This occurs when the object has not moved 


fix) = 


1 
(n-If(t, - t. 


D(P.P) + DP.) 
re н[ Sen ШИП 


much relative to how far it might have 
‘moved in the time interval between observa- 
tions. 

The composite time-geographic density 
function is shown in equation (12.23), where 
Өх) is the density at any point across a sur- 
face; п is the number of observed control 
points; t, and t, are start and end times, 
respectively; t, and are consecutive point 
pairs; vis the maximum velocity and D(P, Р) 
is the distance function, as described in 
equation 12.20. This equation is used for a 
set of points to estimate the density across 
space. The numerator sums the weighted 
distance ellipse functions for each pair of 
‘sampling points, and the denominator scales. 
this by the maximum distance that may have 
been traveled during that time interval. 


ога) 


The composite of individual ellipses 
may result in complex aggregate density vol- 
umes. Densities will be highest where points 
are clustered or near where paths intersect 
frequently. Sharp edges and sampling arti- 
facts may occur when using a uniform den- 
sity function, at least until sample size 
becomes large. 

Although Figure 12-33 illustrates a 
ТӨРЕ using а uniform distribution function, 
other functions may be used. One form 
assumes the likelihood of occupation. 
decresses linearly with the distance from the 
line connecting two subsequent control 
points. This is often called a linear decay 
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function, because the occupation likelihood 
is assumed to decrease linearly with dis- 
tance. A more rapid ог less rapid decrease 
‘with distance may be represented by other 
functions. 


The composite time-geographic density 
estimate in Figure 12-35 illustrates a space- 
time path and a linear decay function applied 
to associate a probable occupancy area for 
the path. Panel o shows the control points for 
a path, with the time between successive 
observations shown by point size. Panels b 
through d show ТОРЕ calculated using dif- 
ferent maximum velocities, Note thatthe 
highest densities (darkest shades) show 
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‘where the control points are clustered and 
Where the distance between observations is 
short relative to time period between obser- 
vations, which in tum is dependent on maxi- 
‘mum velocities. This implies the net object 
‘movement was small between control 
points, although there is a denser area of 
likelihood there. 


While a maximum velocity may be 
tablished from observations or theoretical 
values, the shape of the distance function 
often isnot, and must be carefully devel- 
‘oped. A uniform function may be more 
defensible if the object is moving at near the. 
maximum speed for most of the duration. 
However, a linear decay function may make 
‘more sense when the sampling interval var- 
ies in frequency. and the object is often mov- 
ing much more slowly than the maximum 
velocity. ТӨРЕ is a developing field, and 
"he interested reader should refer to the 
papers by Downs and colleagues listed at the 
end of this chapter 


Summary 


Interpolation and spatial prediction. 
allow us to estimate values at locations 
‘where they have not been measured. These 
‘methods are commonly used because our 
‘budgets are limited, samples may be lost or 
found wanting, or because time has passed 
since data collection. We may also interpo- 
late when converting between data models, 
for example, when calculating a raster grid 
from a set of contour lines, or when resam- 
pling a raster grid to a finer resolution. 
Spatial prediction involves collecting 
samples at known locations and using rules 
and equations to assign values at unsampled 
locations. There are many ways to distribute 
a sample, including a random selection of. 
sample locations, a systematic pattem, clus- 
tering samples, adaptive sampling. or acom- 
bination of these. The sampling regime 
should consider the cost of travel and col- 
lecting samples, аз well as the nature of the 


spatial variability of the target feature and 
the intended use of the interpolated surface. 


Sample values are combined with sam- 
ple locations to estimate ог predict values at 
‘unsampled locations. There are many spatial 
prediction methods, but the most common 
are Thiessen (nearest neighbor) polygon. 
local averaging (fixed radius), inverse dis- 
tance weighted, trend surface, and kriging 
interpolation. Each of these methods has 
advantages and disadvantages relative to 
each other, and there is no method that is 
uniformly best. Each method should be 
tested for the variables of interest, under. 
conditions in the study area of interest. The 
best tests involve comparisons of interpola- 
tor estimates against withheld sample points. 


Measures of core area are commonly 
identified from spatially distributed observa- 
tions. This form of prediction identifies 
regions of high probability for an object or 
event. Mean center or mean circle are simple 
‘measures. А convex hull, defined as the min- 
imum area polygon encompassing all points 
and with convex exterior angles. is com- 
monly applied. More sophisticated measures 
include kernel mapping. based on centering 
scaled distribution functions over each 
observation, and vertically summing the dis- 
tribution fonctions. 
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Study Questions 
12.1 - Why perform a spatial interpolation? 


12.2 - Describe four different sampling patterns, and provide the relative advantages 
or disadvantages of each. Which do you think is used most in practice, and why? 


12.3 - Draw a systematic sampling patter on the area below. left, and an adaptive 
sampling pattern on the area below, right. Use the same number of sample points, 
e.g. approximately 50, on both. Which do you think will give a better estimate of ter- 
rain locations at unknown points? Why? Would increasing the sample number 
change which sampling design you would think is best? 


12.4 - Draw a cluster sampling pattem on the area below. left, and an adaptive sam- 
pling pattem on the area below. right. Use the same number of sample points, e.g.. 
approximately 50, on both. Which do you think will give a beter estimate of terrain 
locations at unknown points? Why? Would increasing the sample number change 

‘which sampling design you would think is best? 


pw 


Chapter 12: Spatial Estimation 555 


12.5 - Draw the Thiessen polygons (nearest neighbor interpolation) for the set of 


points 


12.6 - Draw the Thiessen polygons (nearest neighbor interpolation) for the set of 


points below. 
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127 - Calculate the cell values indicated at the crosses below, using fixed radius sam- 
pling size with the shown circle. 
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12.8 - Calculate the cell values indicated at the crosses below, using fixed radius sam- 
pling size with the shown circle. 
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129 - Calculate the Z values for the unknown points listed below, using an inverse 
distance weighted approach. Use the three nearest known points (use i = 3, n = 1). 
Known points are shown in map as filled circles and corresponding coordinate and 2 
‘values in the table at right. 


w x У 
Р DECE 
„Же 1] 1512809] 4452473 
2] 24721434 

^ 31 EL 
2 E аже 5288 
э] 3475 A69. 

ef 1475 
7 | Soza | 44773575 
8 | 1544653 | 4481135 


3 1564160 | 4468 2854 
30| 1641631 | 44583704 


Y z [и] 3000927 | 4458093 

TIES 12] 1565379 | 44547242 
1075926 | 44783847 33] 13063 
1825157 | 44099812 34 | 1439465 


1817049 | 3463697 15] 1447097 


12.10- Calculate the Z values for the unknown points listed below, using an inverse. 


distance weighted. sh sei ee arson 
‘Known points are as filled circles coordinate and Z 
Pree NE = ees 


x Y z 
353951] 43787145 | 2040€ 
1512809 | 44796473 | 18630. 
1482285 | 44721434 | 19921 
1489068 | 44649785 | 29401 
1523832 | 44669288 | 21063 
1888273 | 4.075.465 | 22832 
1605073 | 4.75842 | 19335 
1630244 
1642658 | 44811735) 18383 
156.050 | 24682854 | 25219 


(Unt рон: 10| 1641891 | 44683704] 21388 
p x Y z Га] 306927 | 14687093 | 16542 
a [1558595 [44771590 12| 1565379 | 44647242 | 18669 
b[wrssos 44788042 13| 1673053 | 44628165] 24538 
c [1625357 [44695602] 1a | 1433465 | 44824154 | 18379 
21817109 [44837552 15] 1447097 | 44753230 | 1928 
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12.11 - What is a primary difference between a spline interpolation method and a 
trend surface interpolation? 


12.12 - What is the primary difference between a trend surface interpolation and a 
kriged interpolation? 


12.13 - Describe the variogram. What does it represent on the X and Y axes, and what 
are the important regions/points of the plot? 


12.14 - Draw the approximate mean center, standard deviation circle, and maximum 
circle for the following data: 


12.15 - What is the convex bull? How is it calculated determined? 
12.16 - Draw the convex hull for the points depicted below: 
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12.17- Draw the convex hull for the points depicted below. 


12.18 - Describe/define а kemel density map. Include how the values are based on 
the samples. 


12.19 - Write down and describe at least one equation used to generate a density sur- 
face. 
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12.20- Which image below illustrates a wider bandwidth? If you used the same data 
бий иш ee oan yume ming cl seis 
same upper and lower bounds for both surfaces, 


^ 8) 


тад ЖШ Mage ое een ei bent уса al tian ба 
to generate! wo reasons answer, assuming 
ta opar todos Dana om econ 


^ 


MÀ 


561 


1 3 Spatial Models and Modeling 


Introduction 


‘A model is a description of reality. 
Some models are analog. as in how a phys- 
ical globe isa model of the Earth. Here, our 
interest is restricted to digital models, com- 
puter-based representations of spatial phe- 
nomena. These models describe the basic 
properties or processes for a set of spatial 
features, and help us understand their form 
and behavior. 


Many computer-based models use spa- 
ial data, and are developed and run using 
some combination of GIS, general and spe- 
cialized computer programing languages. 
and spatial and non-spatial analytical tools. 
‘Spatially explicit models are a primary 
benefit of GIS technologies, and many spa- 
tial models are based on data in a GIS. 
‘These models may be run in ће GIS, or the 
spatial data may be prepared in a GIS and 
then exported to a model that is developed 
and run outside a GIS. 


While there may be as many classes of 
models as there are modelers, bere we split 
spatial models into three broad and over- 
lapping categories: cartographic models. 
simple spatial models, and. 

‘models. Joseph Berry. an early and well- 
known developer and proponent of spatial 
‘modeling, described cartographic models 
аз automating manual map analysis and 
processing. while spatial models focus on 
applying mathematical relationships. Car- 
tographic models are most often applied to 
identify areas in support of decision mak- 


ing, while simple spatial models often. 
apply sets of equations to predict a specific 
‘continuous variable across space. Carto- 
graphic model outputs are often nominal 
(Suitable or unsuitable) or ordinal (lov, 
medium, or high suitability). Figure 13-1, 
for example, shows the ordinal suitability 
‘of locations for septic systems asa function 
‘of soils, elevation, wetlands, and water- 
‘course proximity. In contrast, the outputs 
from simple spatial models are often imer- 
val/atio measures such as population den- 
sity, accident frequency, or soil erosion. 
nies. 

Cartographic models combine 
layers via overlay, buffers, reclassification, 
and other spatial operations. These models 
often employ the concepts of map algebra, 
described in Chapter 10, but may include a 
broader range of operations. Suitability 
analyses, defined here as the classification 
‘of land according to its utility for specific 
uses, are among the most common carto- 
graphic models but there are many others. 

Most cartographic models are tempo- 
rally static because they represent spatial 
features at a fixed point in time. Data in 
base layers are mapped for given periods. 
These data are Ше basis for spatial opera- 
tions that may create new data layers. For 
‘example, we may be interested in identify- 
ing the land that is most suitable for siting 
septic systems. Costs of installing the sys- 
tem may depend on the slope (steeper is 
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costlier), зой type (some soils are harder to 
‘move or less useful for filtration), current 
land cover (covered by concrete is unsuit- 
able, forests are expensive to clear) or dis- 
tance to roads (heavy machinery is 
expensive to move). Spatial data on eleva- 
tion, soil properties, current land use, and 
rond proximity may be combined to catego- 
rize sites by their suitability for installing 
septic systems. We may use a mathematical 


| These analyses ше 
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relationship for specific calculations of aver- 
age costs; for example, we can characterize 
the relationships between soil types and cost 
to excavate dirt, and we may use а cost-per- 
mile for transport based on local rates, These 
spatial data эге combined ina cartographic 
model to assign а land value to each location 
in a study region. The model is temporally 
static in that the values for the spatial vari- 


map, produced by combining soils. elevation, wetland, and 


model 0 К 
produce satay 
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ables, such as зой type or distance to roads, 
do not change during the analyses. 

Cartographic models may be used to 
analyze change even if they are generally not 
temporally dynamic. For example, we may 
‘wish to analyze vegetation change overa 10- 
year period, based on vegetation maps pro- 
duced at the start and end of the period. Each 
data layer represents the vegetation boundar- 
ies at a fixed point in time. The model is 
static in the sense that the polygon boundar- 
ies for a given layer do not change because 
the vegetation boundaries are mapped as 
found at each point in time. Our cartographic 
model is ‘temporal in how it compares veg- 
eration change through time, but the model 
does not generate new boundaries at new 
locations. It does create a composite of exist- 
ing lines that exist in the input data layers 
and notes, as attributes, the kind of change 
hat occurred. Most GIS-based spatial mod- 
els are cartographic models that are tempo- 
rally static. 

‘Simple spatial models typically apply a 
Mua equations satay resolved чы. 
ables (Figure 13-2). They often rest on equa- 
to ire нр 
‘observations at points or sub areas, and then 
applied across brouder geographic areas. 

For example, William Cooke and col- 
leagues reported on a model of West Nile 
virus infection among birds and the related 
risk of transmission to humans. West Nile 


vinis їз a sometimes fatal disease that varies 
in prevalence through space and time, in part 
because mosquitoes transmit the virus. 
among birds and people. 

Cooke and his associates developed rel- 
‘evant input variables. They compiled data on 
the frequency of bird and human infections 
within each zip code in Mississippi over sev- 
eral years, Human and bird cases were clus- 
tered, with outbreaks concentrated in rural 
areas, and so road density was used as a sur- 
Togate for ruralurban landuse. They also 
‘compiled spatial variables related to mos- 
quito habitat quality, including stream den- 
sity. vegetation type, temperature, and 
precipitation surplus. These were combined 
with virus infection frequency at specific 
locations to fit a predictive statistical model. 
Mapped spatial variables were then applied 
in the model to predict outbreak risk across 
the state, 

‘Simple spatial models are common, with 
hundreds of examples found across а range 
of disciplines. They typically include a 
model derived through sampling and a statis- 
tical fitting а mode! that is subse- 
‘quently applied across space to estimate 
important events, densities, or other charac- 
teristics. 


Spatio-temporal models are dynamic in 
both space and time, They differ from carto- 
graphic or predictive spatial models in that 

time passes explicitly within the running of 
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the model, and changes in time-driven pro- 
‘cesses within the model cause changes in 
spatial variables. Spatio-temporal models 
often attempt to explicitly represent pro- 
cesses within the model. 


The dispersion of oil after a spill isan 


separation and evaporation on exposure to 
air might be combined in a model to predict 
"he changing location of an ой slick. The 
actions of objects аз they move across an 
‘environment may also be represented in a 
spatio-temporal model 


‘Spatio-temporal models include time- 
diver processes within йе framework of 
the model. These processes are typically 
quite detailed and include substantial com- 
puter code to represent important subpro- 
cesses. Our ой evaporation example 
demonstrates the subprocesses represented 
in a dynamic spatial model. Oil evaporation. 
rates depend on many factors. including oil 
viscosity, component өй fractions, wind 
speed, temperature, wave height and action, 
and sunlight intensity. These processes may 
be modeled by suitable functions applied to 
spatially defined patches of oil. The sub- 
model may estimate evaporation of various 
‘components of tbe oil in the patch, and 
‘update the characteristics of oil in that patch. 
Ой chemistry and viscosity may change due 
to more rapid evaporation of lighter compo- 
nents, in tur affecting future evaporation 
calculations. Spatial features may change 
through time due to the represented dynamic 
process; for example, the boundary defining 


an ой spill may vary as the model pro- 
presses. 

Spatio-temporal models are typically 
more limited than other modeling 
approaches in the range and number of spa- 
tial themes analyzed, but they provide а 
‘more mechanistic representation of dynamic 
processes. Substantial effort goes into devel- 
‘oping submodels of important processes. 
Model components and structures focus on 
one ога few key output spatial variables, 
and input data themes are included only as 
they are needed by these subprocess models, 
‘These temporally dynamic models explicitly 
calculate the changes in the output spatial 
variables through time. Feature boundaries, 
point feature locations, and attribute vari- 
ables that reflect the spatial and aspatial 
characteristics of key output variables may 
change within the model run typically mul- 
pleines and wih ea explici temporal 
frequency. 

Simple spatial models and spatial statis- 
tical analyses are often used as precursors to 
spatio-temporal models. By uncovering key 
processes or rates, they can guide further 
analysis. For example, in our ой spill exam- 
ple, the specific relationship between wave 
height or frequency and oil separation may 
be represented by an equation, but the spe- 
cific parameters that define the shape of the 
relationship may be estimated via а statisti- 
cal process. Experiments or observations on 
separation rates at various wave heights may 
be collected, and the specific model parame- 
ters estimated. These may then be included 
asa component of the larger spatio-temporal 
model. 
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Cartographic Modeling 

A cartographic model provides infor- 
mation through a combination of spatial data 
sets, functions, and operations. These func- 
tions and operations often include reclassifi- 
cation, overlay, interpolation, terrain 
analyses, buffering, and other functions. 
Multiple data layers are combined via these 
‘operations, andthe informatica is typically 
in the form of a spatial data layer. Map alge- 
bra, described in Chapter 10, is often used to 
specify cartographic models for raster data 
ses. 


Suitability analyses are perhaps the most 
‘common examples of cartographic models. 
These analyses rank land according to its 
utility for various purposes. Suitability anal- 
yses often involve the overlay, weighting, 
and rating of multiple data layers to catego- 
rize lands into various classes. Relevant data 
layers are combined and the resultant poly- 
gons are classified based on the combination 
of attributes, Figure 13-3 illustrates a sim- 
plistic cartographic model for the identifica- 
tion of potential park sites. Suitable sites are 
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proximity to roads and lakes, and 


in wetlands. The model uses three input data 
layers. containing lakes, roads, and hydric 
status for a common study area. Spatial 
operations are applied to the spatial data lay- 
тз, including reclassification, buffering, and 
overlay. These result in a suitability layer. 
This suitability layer can then be used to nar- 
Tow sites for further evaluation, 
‘owners, or otherwise aid in park 
tion, 


corridor studies, the design and development 
‘of water distribution systems, modeling the. 
spread of human disease or introduced plant 
and animal species. building and business 
site selection, pollution response planning, 
and endangered species preservation, Carto- 
graphic models are so extensively used 
because they provide information useful to 
managers, the public, and policy makers 
and help guide decisions requiring the con- 
sideration of spatial location across multiple 
themes. 


сыну ranking 


model. The model eats mutable park sites based on the 
the sence of wetlands 
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Cartographic models are often suc- 
cinctly represented by flowcharts. A flow- 
chart is a graphic representation of the 
spatial data, operations, and their 

of use in a cartographic model. Figure 13-4 
strates a flowchart of the cartographic 
model illustrated in Figure 13-3. Suitable 
Sites are sought that are near roads, near 
lakes, and not in wetlands. Data layers are 
represented by rectangles, operations by 
ellipses, and the sequence of operations by 
arrows, Operations are listed in each ellipse. 
Flowcharts are often required by an agency 
‘or organization to document a completed 
spatial analysis. Because a consistent set of 
symbols aids in the effective communication. 
‘of the cartographic model, a standard set of 
symbols and flow charting methods may help. 
in understanding the data and operations in 
an analysis. 

Flowcharts are useful during the devel- 
‘opment and application of a cartographic 
model. Flowcharts aid in the conceptualiza- 
tion and comparison of various competing 
approaches and may aid in the selection of 
the final model. A flowchart із often an effi- 
cient framework for documenting a carto- 
graphic model, File locations, work dates, 
and intermediate observations can be noted 
with reference to the flowchart, or directly 
‘onto a copy of the flowchart. 
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‘Cartographic modeling often produces a 
large number of intermediate or temporary 
data layers that are not required in the final 
output, Our example in Figure 13-4 illus- 
trates this. The needed information is con- 
tained entirely within the suitability ranking 
бау Fig oe cas ayes a pos 
duced within the illustrated cartographic 
model. Buffered, recoded, and overlay layers 
were necessary intermediate steps, but in 
this analysis their utility was temporary. This 
proliferation of data layers is common in 
cartographic modeling. and it can cause 
problems as the new layers and other files 
accumulate in the computer workspace. Fre- 
quent removal or archiving of unneeded files 
is helpful. 

Much of the power of cartographic mod- 
ling comes from the flexibility of spatial 
analysis functions. Spatial functions and 
‘operations are a set of tools that may be 
mixed and matched in cartographic models. 
Overlay, proximity, reclassification, and 
most other spatial analysis tools are quite 
general. These tools may be combined in an 
astoundingly large number of ways, in selec- 
tion and order of application, These varia- 
tions will result in different output data 
layers, even when using the same input data 
layers. With a small set of tools and data lay- 
ers, we can create a huge number of carto- 
graphic models. Designing the best 
Cartographic model to solve a problem — 
the selection of the appropriate spatial tools 
and the specification of their sequence — is 
perhaps the most important and often the 
most dificult proces in cartographie model- 
ing and spatial analysis more generally. 


Designing a Cartographic Model 

Most cartographic models are based on 
a set of criteria. Unfortunately. these criteria 
are often initially specified in qualitative 
terms, such as “the slopes must not be 
steep.” A substantial amount of interpreta- 
tion may be required in translating the crite- 
ria in a suitability analysis into a specific 
sequence of spatial operations. In our pres- 
ent example, we must quantify what is 
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meant by “too steep." General or qualitative 

criteria may be provided and these must be 

converted to specific, quantitative measures. 

The conversion from a qualitative to quanti- 

tative specification is often an iterative pro- 

cess, with repeated interaction between the 
analyst developing and applying the carto- 
graphic model and the manager or decision- 
maker who will act on the resultant informa- 
tion. 

We will use a home-site selection exer- 
cise to demonstrate this process. The prob- 
lem consists of ranking sites by suitability 
for home construction. The area to be ana- 
lyzed has steep terrain and is in a seasonally 
cold climate. There are four criteria: 

a) Slopes should not be too steep. Step slopes 
may substantially increase costs or may 
preclude construction, 

в) Southem aspects are preferred. to enhance 
solar warming. 

€) Soils suitable for on-site septic systems are 
required. There в а range of зой types in 
the study area, with a range of suitabilities 
for septic system installation. 

4) Sites should be far enough from a main road 
1o offer some privacy. but not so far as to 
be isolated. 

These criteria must be converted to 
more specific rules prior to the development 
and application in a cartographic model. The 
decision-maker must specify what sort of 
classification is required. Is a simple binary 
classification needed, with suitable and 
unsuitable classes, or is a broader range of 
classes needed? Ifa range of classes is speci- 
fied, з an ordinal ranking acceptable, or is 
an interval/ratio scale preferred? These ques- 
tions are typically answered via discussions 
between the analyst and the decision-mak- 
ers. Each criterion can then be defined once. 
the type and measurement scale of the 
results are specified. It may be fairly simple 
to establish the local slope limit that probib- 
its construction. For example, conversations 
with local building experts may identify 30 
degrees as a threshold beyond which con- 
struction is infeasible. Further work is 
required to quantify how less-steep slopes 
affect construction costs. Similar refine- 


ments must be made for each criterion. We 
must quantify the range and any relative 
preferences for southem aspects, relative 
Soil suitabilities, what defines a main road, 
‘and what constitutes short and long dis- 
tances. 


А second key consideration involves the 
availability and quality of data. Do the 
required data layers exist for the study area? 
Are the spatial accuracies, spatial resolution, 
and attributes appropriate for the intended 
analysis? How will map generalizations 
affect the analysis; for example, vill inclu- 
sions of different soil types in a soil polygon 
lead to inappropriate results? Is the mini- 
mum mapping unit appropriate? If not, then. 
the requisite data must be obtained or devel- 
‘oped. or the goals and cartographic model 
modified, 


Weightings and Rankings 


While some cartographic models are 
simple and restrictive, many more carto- 
graphic models require the Combination of 
criteria that vary across a range of values, 
and require an explicit ranking of the relative 
importance of differen classes or types of 
criteria. A simple, restrictive example might 
require us to identify parcels greater than a 
certain size and within a certain distance of 
water. We may clearly identify areas that 
meet these desired conditions. 


А much more common class of prob- 
lems requires us to integrate multiple criteria 
that are qualitatively different. For example, 
site suitability for hazardous waste storage. 
depends on a number of factors including 
distance to population centers, transporta- 
tion, geology and aquifer depth and type. 
‘We must rate sites across a range of values 
for all of these variables. Once criteria are 
precisely defined, we must obtain appropri- 
ate data, develop a flowchart or plan for our 
analysis, and address the more difficult prob- 
lem of assigning rankings within each crite- 
fia, and assigning the relative weightings 
among criteria. Note that in the following 
discussions we use the word “rankings” 
‘when describing the assignment of relative 
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values within the same layer, such as how 
‘we rank a sandy зой vs. a silty зой in a soils 
layer. We use the word “weightings” when. 
assigning the relative values of different lay- 
ers, for example, how we weight the values 
in an elevation layer vs. the values in a 
landuse layer. Keep in mind that weightings 
for very different criteria are made compara- 
ble by considering how well they satisfy the 
goal ofthe analysis. 


Rankings Within Criteria 


Each criterion in our cartographic model. 
is usually expressed by a data layer. or "cri 
terion layer.” Each criterion layer is a spatial 
representation of some constraint or selec- 
tion condition: for example, the criterion we. 
build outside a floodplain may consist of a 
set of numbers in a layer identifying flood- 
plain locations. Floodplain sites may be 
assigned a value of 0, and upland sites a 


value of 1. Before we can assign a valve to 
any site, we must first obtain (or make) 
floodplain maps and interpret the codes in 
the maps to delineate the most flood-prone 
areas. This process allows us to rank areas 
based on the likelihood of flooding. 

We must explicitly formalize our rank- 
ing for each layer used to represent a crite- 
rion. One early decision is whether ranks 
should be discrete or continuous (Figure 13- 
5) Rankings are discrete when input data аге 
interpreted such that the criterion data layer. 
is a map of discrete values. Soils are either 
‘good or bad for construction, slopes either 
too steep or acceptable, and the final map 
defines two or more discrete classes; for 
example, sites are categorized as either suit 
able oc unsuitable. Ranks are continuous. 
‘when they vary along a scale; for example, 
soils may be rated from 1 to 100 for con- 
struction suitability 


Figure 13.5: Rackings within a layer are обе discrete (top right) or conmo (bottom) 
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Figure 13-5, top right, shows the assign- 
ment of discrete ranking of land productivity 
based on values in a sol layer. The source 
layer in the top left of the figure is analyzed. 
Ifthe expected production fora given soil 
polygon is less than 66, then the output rank- 
ing is set to 0. If the production is greater 
than or equal to 66, then the output rank is 
setto 1, A range of input values has been 
placed into two discrete classes, illustrated 
as the discrete rank layer in the top-right part 
of Figure 13-5. 


Discrete rankings are most often used 
When there are clear, discrete classes to be 
represented in criteria. A disease may be 
present or absent, forest stands coniferous or 
deciduous, or a lock inventoried or not. The 
values to be represented are discrete catego- 
ries. 


In contrast, we may apply criteria as 
continuous rankings within a cartographic 
‘model. These continuous rankings provide а 
range of values to characterize a suitability 
or restriction, and they result in а set of 
incrementally varying ranks. Ranks (or 
scores) typically range over a real or large 
integer interval. for example, from 0 to 1 or 
O to 1,000. Highest suitability is usually 
assigned to the highest rank, and lowest to 
the bottom. 

‘The bottom right of Figure 13-5 shows 
а continuous ranking over a range of 0 to 10. 
A high value of 10 is specified for the most 
productive soils, and а low value of 0 for the 
least productive. We may use production 
dis uber сига sol mi pes id 
map of soil types to assign the relative value 
of each soil We could scale production from 
the lowest to the highest observed over the 
range of O to 10, and in so doing create a 
layer that represents a зой productivity crite- 
ria. 


We are not constrained to linear or 
always increasing or decreasing relation- 
ships between our input layers and our crite- 
ria layers. There may be complex 
relationships between an input value and our. 
output ranking scores or values. Any curve 
or relationship we can create with a combi- 


nation of mathematical and logical functions 
may be represented, to reflect increasing, 
decreasing. or complex relationships. 

We should have some justification for 
‘adopting a specific curve when establishing 
relative ranks within a layer. For example, 
‘we may wish to represent the mercury haz- 
ard based on methyl mercury concentrations 
in water supplies across a state for determin- 
ing locations to drill wells for drinking 
water. There may be a broad range of low 
mercury concentrations for which there are 
no or few negative health impacts. However, 
аз a threshold concentration is reached, there 
Thay be a rapid uptum into a very steep 
curve, where the risk of severe damage is 
great (Figure 13-6). The shapes of these 
‘curves should be established through sets of 
‘epidemiological studies, in which mercury 
‘concentration in human blood or tissue was. 
related to drinking water, and health impacts 
were recorded for thousands of people at 
various levels of mercury exposure, 


health risk or effects 
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Figure 13-7 illustrates two examples of 
continuous criteria scores. Figure 13-7. 
shows the representation of a complex road. 
criterion fora cartographic model. This cri- 
terion specifies that desirable sites are 
greater than 300 but less than 2,000 m from. 
a road. The top left graphic of Figure 13-70 
shows the original roads layer. Following the 
arrows counter clockwise, you find the dis- 
tance layer, a raster with the distance from 
the nearest road recorded in each cell value. 
1а his example, the distances range from 0 
to 6,000 m. The graphic in the lower right of 
Figure 13-70 shows a suitability assignment 
function. Distance values are recorded along 
the horizontal axis, and are used to assign 
suitability for building, shown on the verti 
cal axis. This function assigns suitability 
scores ОГО for distances less than 300 m. 
Suitabilities increase and distance increases, 
in a linear fashion, to a score of | at a dis- 
tance of 1,150 m half way between 300 and. 
2,000. Scores then decline linearly to a value. 
‘of 0 at 2,000 m. and remain 0 for all dis- 
tances greater than 2.000 m. 


Figure 13-70 illustrates a continuous. 
ranking of suitability scores, in this instance 
for slope. Slopes are calculated from the ele- 
vation layer (Figure 13-70. lf), ranging 
from 0 to 49.6 degrees for this data set. 
Slope values are transformed to continuous. 
slope suitability values using a smoothly 
decaying function (lower right. Figure 13- 
76). These values are assigned to each cell 
location in an output slope suitability data 
layer (top right, Figure 13-70). 

Note that these continuous rankings may 
be combined, often through a weighted addi- 
tion process, to generate a combined suit 
ability score. The various suitability layers 
sum vertically to give a total composite 
score for each cell. This score may be used 
to rank areas on relative suitability. Discrete 
and continuous suitability layers may be 
combined using a mix of Boolean and addi- 
tion operations to provide a final ranking. 
‘This combination often requires that we 
define the relative importance of each crite- 
ria layer, a process known as weighting 
among criteria. 


Weighting Among Criteria 

Distinct criteria must be combined in 
many spatial analyses, usually in some over- 
lay or addition process (Figure 13-8). We 
must choose how to weight one layer rela- 
ive to another. How important is slope rela- 
tive to aspect? Will an optimum aspect offset 
а moderately steep site? How important is 
isolation relative to other factors? Becaus 
the criteria will be combined in a suitability 
data layer, the relative weightings given each 
criterion will influence the results. Different 
relative weights are likely to result in differ- 
ent suitability rankings. It is often difficult to 
assign these relative weights in an objective 
fashion, particularly when suitability 
depends on nonquantifiable measures, 

The assignment of relative weightings is 
easiest when the importance of the various. 
criteria may be expressed on a common 
Scale. In our example, we may be able to 
assign monetary costs to increasing slope. 
‘Septic costs could be estimated from soil 
type because different septic systems may be 
required for different зой types, either 
through larger drain fields or needing pumps 
and equipment for more sophisticated sys- 
tems. Nuisance cost for noise and distance 
соя in lost time or travel might also be quan- 
tified monetarily. Reducing all criteria to a 
common scale, like money, removes differ- 
ential weighting among criteria 

There are many instances where a com- 
‘mon measurement scale is not possible. 
‘Many rankings are based on variables that 
are difficult to quantify. Personal values may 
define the distances from a road that consti- 
тше “isolated” versus "private," or what is 
the relative importance of slope vs, construc- 
tion cost. Expert opinion, group interviews, 
or stakeholder meetings may be used to rank 
‘when there are multiple or competing par- 
ties. The scales for these variables are inher- 
ently different, and there is no clear way to 
translate them to a common scale. 

Note that in this example, we assume 
thatthe values in each layer associated with 
each criterion have appropriate ranges, or at 
least are on similar scales. In Figure 13-8, 
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Criterion A layer 


/ 


"he values for criterion layer A and criterion 
layer B vary over an approximately equal 
range. fone layer hada range from 1,000 to 
5,000 and the other ad values of 1 to 5, then 
"his would affect the combination and final 
suitability ranking unless the ranges are stan- 
dardized to a common range. 


Tn addition, many combinations implic- 
Шу assume that the scales are approximately 
linear in our ranking within and across tbe 
criteria. We often combine the values within 
a criterion layer using an arithmetic opera- 
tion, for example, by summing values with 
weights . The relative weights among and 
‘within each layer are mixed, which is often a 
logical course of action under an assumption 
of linearity. Strongly nonlinear relationships. 
in the ratings and weightings scales often 
lead to counterintuitive and unwanted suit- 
abilities 

One method of assigning weights is 
‘based on their "importance ranking." The 
factors (criteria) used to decide the quality of 
ıa ste may be ranked in their importance, 


Wa 2W 4 шу 


‘Suitably loyer. 
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value 


from most important to least. We may then 
calculate the relative weights according to: 


nant 
[I 


Yano 


‘where wi is the weighting for criterion | n is 
the number of criteria, and k is a counter for 
summing across ай criteria. 

Suppose we wish to rank sites for store 
placement based on four factors: distance to 
nearest competitor, distance to nearest major. 
road, parking density. and parcel cost. Figure 
13-9 shows an example calculation of crite- 
па weights based on importance ranking, 
Each criterion is listed in the leftmost col- 
umn. Ranks are assigned to criteria by the. 
planner, client, decision-maker or interested 
group. The numerator of equation 13.1 is 
calculated foreach criterion. giving the most 
important criterion the highest value and the 
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aetonding to equation (141). 


least important the lowest value. The 
denominator is calculated by summation, 
‘and then the individual weights calculated, 
аз shown in the right most column. These 
‘weights may then be used to combine the 
data from the various criteria. 

There are many other methods for defin- 
ing the values for each criterion layer and the. 
relative weightings among layers. These 
include methods that attempt to ensure con- 
sistency among weights, but they are beyond 
the scope of this introductory text. You may 
find more detailed d in the excel- 
lent book by Malczewski (1999) listed in the 


Suggested readings section atthe end of this 
chapter. 


Cartographic Models: A Detailed 
Example 


Неге we provide a detailed description 
of the steps involved in specifying and 
applying a cartographic model. We use a 
refinement of the general criteria for home- 
site selection described inthe previous sec- 
tion. These general criteria are listed on the 
left side and the refined criteria are shown on 
the right side of Table 13-1, The refined cri- 


Table 13-1 Original and refined criteria for cartographic model example. 


General Criteria Refined Criteria 
‘Slopes not too steep Slopes < 30 degrees 
Southem aspect preferred 90 <Aspect < 270 
Soils suitability Specie ist of sepicsuitable зой units 
"Far enough from road to pro- | 300 meter < distance to road < 2,000 meters 
vide privacy, but not isolated 
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teria may have been defined after further dis- 
‘cussion with the decision-makers, local area 
experts, and a review of available data and 
methods. 


Note that we adopt the simplest weight- 
ing and ranking scheme in applying the cri- 
teria in Table 13-1. АШ criteria are equally 

ighted, and all criteria are binary — land 
is categorized as unsuitable or suitable based 
‘on each criterion. A location must pass all 
criteria to be suitable, and the final rating is 
Suitable or unsuitable. 


1а our example, we will apply the carto- 
graphic model described by the flowchart in 
Figure 13-4 to a small watershed in a moun- 
tainous study area. Application of the 
refined criteria requires three base data lay- 
ers — elevation, soils, and roads. For this 
‘example we assume the three data layers are 
available at the required positional and 

bute accuracy, and clipped to the study area. 
The need for new data layers often becomes 
apparent during the process of translating the 
initial, general criteria to specific, refined 
criteria, or during the development of the 
flowchart describing the cartographic model. 
Once data availability and quality have been 
assured, we can complete the final flow- 
chart. 

Figure 13-10 contains a flowchart of a 
‘cartographic model that may identify suit- 
able sites. Spatial data layers are shown as 
rectangles, and a descriptive data layer name 
is included within the rectangle. Spatial 
‘operations or functions are contained in 
ellipses, and arrows define the sequence of 
data layers and spatial operations. The three 
‘base data layers (elevation, soils. and roads) 
эге shown atthe top ofthe flowchart. 

There are three main branches in the 
flowchart in Figure 13-10. The leftmost 
branch addresses the terain-reated criteria, 
the center branch addresses the soils criteria, 
and the right branch applies the road dìs- 
tance criteria. All three branches join in the 
cartographic model, producing a final suit- 
ability classification. 

The spatial processing steps for the left 
branch of the cartographic model is shown in. 


detail in Figure 13-11. This and subsequent 
detailed figures show a thumbnail of the spa- 
tial data layers at each step in the process. 
Data layer names are adjacent to the spatial 
data layer. The first two criteria involve ter- 
rain-related constraints. Suitable sites are 
restricted o а set of slopes and aspects. 
These criteria require slope and aspect data 
layers, to be calculated and then classified 
into areas that do and do not meet the respec- 
tive criteria. The elevation data layer is 
shown at the top of Figure 13-11: low eleva- 
tions in black through higher elevations in 
lighter shades. There are two main river sys- 
ems in the study area, one running from 
‘west to east in the northern portion of the 
study area, and one. from south to 
nor. Highland areas are found along the 
north, west, and east margins of the study 
area. 


Slope and aspect are derived from the 
elevation data layer (Figure 13-11). Lower 
slope values are shown in light shades, 
higher slope values are shown in dark 
shades, and aspects are shown in a range of 
light to dark shades from 0 to 360 degrees. 
and aspect layers are reclas 

Sed cae reed alor specified i 
the criteria listed in Table 13-1. A reclassifi- 
cation table is used to assign values to the 
Зоре яз variable based on the slope layer. 
Cells with а зоре vol less than 30 are 
assigned а зоре suf of 1, while cells with a 
зоре vol of 30 or higher are given a value 
зоре suit of 0. Aspect values are also reclas- 
sified using a table. 

Slope and aspect layers are combined in 
an overlay. converted from raster o vector, 
and reclassified to produce а suitable terrain 
layer (Figure 13-12). Raster to vector con- 
version is chosen because two of the three 
base data layers are in a vector format, and 
because future complex selections might be. 
better supported by the attribute data stroc- 
ture used for vector data sets. This conver- 
sion creates polygons that have ће attributes 
ofthe input raster data layer. Note this con- 
version takes place after the raster layers 
have been reclassified into a small number 
of classes, and after the data have been com- 
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Tecloss 
Slope < 30°, vig 
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bined to a single layer in an overlay. Raster- 
to-vector conversion proceeds more quickly 
after the number of raster classes has been 
reduced and the data combined in а single 
terrain-suitability layer. 

The terrain overlay must then be reclas- 
sified to identify those areas that meet both 
the slope and the aspect criteria (see the ter 
rain suitability coding in Figure 13-12) 
Those polygons with а 1 for both siope_sut 
and osp.sur are assigned a value of 1 for 
terrain suit. АЙ others are given a value of 


Slope reclass 
layer „г 


Terrain overlay, 
layer 
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0, indicating they are unsuitable home sites 
based on the slope and/or aspect criteria. 

Because we wish to reduce the number 
of redundant polygons where possible a dis- 
solve is applied after the reclassification. 
‘This substantially reduces the size of the out- 
put data set, and speeds future processing. 
Reclassified, dissolved terrain data are saved 
ina layer labeled Suitable Terrain (Figure 
13-12, center-right). 

‘The central branch of the cartographic 
model in Figure 12-10 is shown in Figure. 


Aspect reclass 
layer 


Suitable 
Terrain 


sepe sut орай terrain sur 
0 о 0] The terrain suitability code is 
o 1 a | assigned based on this recloss. 
{ble Only those polygons with 
1 0 0 suitable slope aspect оге 
1 1 1.—] assigned отео зой value of 1 
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13-13. Digital soil surveys are available that 
depict homogeneous зой units as polygons. 
Attribute data are attached to each polygon, 
including зой type and зой suitability for 
septic systems, Soils data for the study area 
may be reclassified based on these septic 
suitability attributes. А reclassification table 
assigns a value of 1 tothe variable зой зил 


the soil type is suitable for septic systems, 0 
if the soil type is not (Figure 13-13). 

Afer reclassification, there may be 
many adjacent soil polygons with the same 
soi sut value. These are grouped using a 
dissolve operation (data between reclass and 
dissolve are not shown in the figure; see 
Chapter 9 for an example). The dissolve 
removes boundaries between like polygons, 
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the зой type. The variable soi. suit 
is assigned а value О for unsuitable 


soll types and 1 for suitable soll types. 
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thereby substantially reducing the number of 
polygons and hence the number of entries in 
the atribute table. This may be particularly 
important with complex data sets such as 
зой data, or with converted raster data, аз 
these often have thousands of entries, many 
of which will be combined after the dissolve. 


‘The right branch ofthe cartographic 


model in Figure 13-10 is presented in Figure 
13-14. The Roads data layer is obtained and 
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Major roads extracted. This has the effect of 
removing all minor roads from consideration 
їп further analyses, What constitutes a major 
road has been defined prior to this step. In 

this case all divided and multi-lane roads in 
the study area were selected. Two buffers are 
applied, one ata 300 m distance and one at a 
2 kın distance from major roads. These buf- 
fers are then overlaid. Because the buffer. 

regions extend outside the study area, the 
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‘buffers must be clipped to the boundary of 
the study area. These data are then reclassi- 
fied into suitable and unsuitable areas, 
resulting in the Road buffers layer (lower. 
left, Figure 13-14). 


АШ data layers are combined in a final 
set of overlays and reclasifications (Figure 
13-15). The Suitable access layer, derived 

from the roads data and criteria, is combined 
with the Terrain & sois layer. The Al criterio 


Terrain & soils 


All 
criteria 


layer contains the required spatial data to 
identify suitable vs, unsuitable sites. This 
overlay layer must be reclassified based on 
the road, soil, and terrain suitability vari- 
ables, classifying all potential sites into a 
suitable or unsuitable class. A final dissolve 
yields the final digital data layer, Sutatie. 
sites 

Thisexample analysis, while simple and 
limited in scope, illustrates both the flexibil- 
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ity and complexity of spatial data analysis 
using cartographic models. The cartographic 
model is simple because only three input 
spatial data layers were required, and a small 
Set of spatial data operations were used. 
Most real analyses use many more data 
inputs, but the basics are similar. Reclass, 
overly, and other operations were used 
repeatedly. The modeling is flexible because 
spatial operations may be tailored to the 
problem. Finally, this example illustrates the 
complexity that can be included in carto- 
graphic models, as over 20 diferent 
instances ofa spatial operation were applied, 
ina defined sequence, in at least 15 
intermediate data layers as well as the final 
result layer, 


Scripting and Models 


Many softwares provide scripting or 
programing environments to specify a 
Sequence of spatial operations (Figure 13- 
16), Examples include ArcGIS Model 
Builder, the QGIS Batch Modeler and 


Eoi ento ШЫ 
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Graphical Modeler, and the GRASS wxGUI. 
These modeling tools can create a chain of 
‘operations that may be saved, re-run, shared, 
‘oF applied with different input data ог 
parameters. The models may be viewed as a 
recipe for a set of spatial tools, applied to 
data, with output from operations used as 
input in subsequent operations. 

The programing environments often 
allow complex flows, including looping 
through various iterations of data or parame- 
ters, and branching based or termination. 
based on conditions. Scripts are often quite 
helpful in both saving a cartographic model 
зо that it may be repeated with new data, and 
for documenting the steps applied in an anal- 
ysis. Scripts are particularly helpful when 
processing must be applied repetitively to 
different instances of the same type of data, 
€. lo re-project then re-code hundreds of 
raster data tiles. This would be tedious to 
‘complete with а standard graphical user. 
interface, but may take only minutes of user 
time when incorporated into а script. 


model developed with tbe wxGUI for GRASS. 


‘represent dat and пута show wak flow (courte 
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Simple Spatial Models 


Predictive spatial models are commonly 
applied to a range of problems, particularly 
‘when there is a well-established model based 
‘on point or small-scale observations and 
analysis, and when the output is a continu- 
ous Variable, for example, temperature, 
housing value, soil erosion rates, or cancer 
frequency. 

As noted earlier, our simple spatial mod- 
els typically are based on one ora few equ 
ions, described as: 


O = f(A, B, C, D, By, B. 32) 


Where О is the spatially-reference output; fO. 
is a mathematical operation: А, В, С, D are 
variables; and D /' are equation parameters. 
For example, NASA has sponsored the 
development of global models of gross pri- 
‘mary productivity (GPP). the total biomass 
produced globally by plants in any given 
year. One common model takes the form: 


GPP = £- NDVI: PAR (33) 


‘where NOVI is a satellite-based measure of 
plant abundance, PAR is the amount of sun- 
light available for photosynthesis, and € is a 
conversion efficiency, which may be fixed, 
‘or may depend on additional factors, such as 
‘vegetation ог soil type. [n this example, our 
equation is simple multiplication of the com- 
ponents, and € is the unique parameter in the 
simplest case of a fixed e. In more compli- 
cated forms, there isa different e for each 
vegetation type. applied accordingly. 
Simple spatial models require spatial 
fields of all variables, and appropriate 
parameters for all conditions in the modeled 
area. We must have estimates of NDVI and 
PAR over our prediction region. In this spe- 
cific case, robust measurements of NOVI 
have been developed based on repeat satel- 
lite measurements, as have methods to esti 
mate PAR from the available meteorology 


networks and measurement systems. Values. 
oft have been estimated for dominant vege- 
tation types, and how these parameters vary 
with other environmental factors like tem- 
perature and available moisture. Model esti 
mates have been compared to measurements 
across a broad range of conditions. 
While we сай these simple spatial mod- 
els, as the previous and subsequent examples 
will show, it is often time consuming and 
difficult to develop the spatial data and esti- 
mate tbe parameters to apply these models 
across space, The models are often based on 
observed relationships and measurements at 
points or small plots, for example, crop. 
growth on sunny vs. cloudy days, or the. 
change in GPP across nearby forest stands 
with different NDVI values. These may suf- 
fice to estimate € for the specific types, but 
differences among vegetation types may 
require repeat measurements over a broad 
range of conditions, A network of field sta- 
tions, perhaps in combination with remotely 
sensed data, may be required to estimate the 
input variables, for example, PAR at the 
required intervals across the landscape. 


Another example of a spatial model is 
the Revised Universal Soil Loss Equation 
(RUSLE) and its precursor, ће Universal 
Soil Loss Equation (USLE), which are 
among the most widely used simple spatial 
models: 


E= R-K-C-P-L-S  ] 


‘where € is average annual erosion, R is a 
rainfall factor, K reflects зой erodibility. С 
integrates crop effects, P accounts for man- 
agement practices, L reflects slope length, 
and S represents steepness. 

The USLERUSLE predict soil erosion 
оп farm fields, and have been under devel- 
‘opment since the 1930s. Rainfall intensity, 
зой properties, crop type, slope steepness, 
and slope length factors have been measured 
in tens of thousands of plots. Supporting 
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information has been developed for the 
entire country by the U.S. Natural Resources 
Conservation Service, including soil and cli- 
mate factors for the United States, and the 
impacts of common crop types and manage- 
ment regimes, The USLE and RUSLE have 
been widely applied in other countries. 

The RUSLE has been widely applied 
within a GIS framework for erosion esti 
mates on а catchment or larger scales. The 
model is relatively simple, much of the input 
data have been developed and are publicly 
available, and the outputs are of broad inter- 
ея. Methods for applying the model have 
varied, in part because the model was devel- 
‘oped for individual fields, but spatial data 
are often not available on a per-field basis. 
While the rainfall factor, R, is generally sim- 


DEM. 


ilar across county-sized areas spanning tens 
of kilometers, other factors often change on 
field or subfield basis. Applications often 
differ in the methods for estimating the man- 
‘agement and crop factors, and in particular a 
‘combination of slope length and steepness 
factors. 


Estimating driving variables across. 
space often presents choices as illustrated in 
the calculation of the RUSLE slope factors, 
Land S (Figure 13-17). Simple spatial mod- 
els are often based on small area studies for 
Which all variables may be easily measured. 
This is often not true when applying the. 
models to larger areas. Slope steepness (5 
factor is easily estimated within а raster 
бзпекой, bu slope length) is consid- 
‘ered uniform at a fixed length of 22.1 m 


One of various measure- 
ment-derived functions is. 
applied, incorporating 
both slope angle and 
slope length, to 

IÀ colculate the LS factor. 
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(72.6 f) inthe standard RUSLE. Application. 
of the RUSLE to convergent or divergent 
slopes or to lengths or cell sizes different 
‘than the standard may result in prediction 
errors. This challenge has been the focus of 
‘many studies, and the book chapter by Wil- 
son and Lorang, listed in the references, 
describes some of the methods used to effec- 
tively estimate a combination of L and S. 


Erosion =RKCPLS 


Remaining К. С, and P factors may be 
derived from standard spatial datasets, for 
example, NASS or NLCD data for land- 
соеп стор type and treatment, and K factors 
from SSURGO data (Figure 13-18). Appli- 
cation of the model to the spatial data, here 
in a cell-by-cell multiplication, yields esti- 
mates of erosion across a region. 
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Figure 13-8 USLE RUSLE erosion estimates may be clelated fom appropriately developed base 
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Spatio-Temporal Models 


‘Spatio-temporal models have been 
developed and applied in a number of disci- 
plines. This is ап active area of both research 
and application, as there are many fields of 
study and management that require analysis 
and predictions of spatially and time varying. 
phenomena. We will briefly discuss some 
basic characteristics of spatio-temporal mod- 
els. We will then describe their differences 
from other models, discuss some basic anal 
ysis approaches, and describe two examples 
of spatiotemporal models. 

Spatio-temporal models use spatially 
explicit inputs to calculate or predict spa- 
tially explicit outputs (Figure 13-19). Rules, 
functions, or some other processes are. 
applied using spatial and often nonspatial 
data. Input variables such as elevation, vege- 
tation type, human population density, or 
rainfall may be used as inputs to one or more 
mathematical equations. These equations are 
then used to calculate а value for one ог 


temporally dependent 


more spatial locations. The values are often 
saved in a spatial data format, such as a layer 
ina GIS. 


‘Spatio-temporal models involve at least 
ıa three-dimensional representation of one ог 
more key attributes — variation in planar (X- 
Y) space and through time. A fourth dimen- 
sion may be added if the vertical (Z) direc- 
tion is also modeled. We typically treat 
spatially variable network analyses sepa- 
rately, because networks are constrained to а 
subset of two-dimensional space. 
‘Spatio-temporal models may also be 
classified by a number of other criteria: 
whether they treat continuous fields or dis- 
Continuous objects, if they are process based 
or rely on purely fit models, and if they are 
stochastic or deterministic, Combinations of 
these model characteristics lead to a broad 
array of spatio-temporal model types. 


Spatial and 


functions 


Spatial output 
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Models of continuous phenomena pre- 
dict values that vary smoothly across time or 
space. Air temperature, precipitation, soil 
moisture, and atmospheric pollutants are 
‘examples of continuous variables that are 
predicted using spatio-temporal models. Soil 
moisture this month may depend on soil 
‘moisture last month and the temperature, 
precipitation, and sunshine duration in the 
intervening period. All these factors may be 
centered in spatial data layers, and the soil 
moisture predicted for a set of points. 


Models of discrete phenomena predict 
spatial or attribute characteristics for discon- 
‘tinuous features. Boundaries for vegetation 
туре ae an example fetes iint are 

n considered discrete. We use a line to 
identify the separation between two types. 
for example. between a grassland and a for- 
est. A spatial model may consider the cur- 
rent position of the forest and grassland as 
well as soil type, fire prevention, and cli- 
matic data to predict the encroachment of. 
forest on grassland sites. The boundaries 
between new forests and grasslands are 
always discrete, although their positions 
shift through time. 


Models can be process based or cali- 
brated via model fit. Models are process 
based if their workings in some way repre- 
senta theoretical or mechanistic understand- 
ing of the processes underlying the observed 
changes. In contrast, models are purely fit 
models when they are calibrated against data 
‘without trying to capture underlying mecha- 
nisms. We may predict the amount of water 
flowing in a stream by a detailed spatial rep- 
resentation of the hydrologic cycle. Many 
processes may be explicitly represented by 
‘equations or subroutines in a spatial model. 
For example, rainfall location and intensity 
‘may be modeled through time for each raster 
cell in a study area. We can then follow the 
rainwater as it infiltrates into the soils and 
joins the stream system through overland 
flow, subsurface flow, and routing through 
stream channels. Calculations for these pro- 
cesses may be based on slope, topography. 
and channel characteristics, These processes 
are tied together in space. Calculations are 


performed at each point on the landscape; 
these calculations increase or decrease water. 
flow or other conditions at adjacent, 
downslope locations. 

Rainfall might be modeled differently 
using a purely fit, statistical approach. A 
purely fit model might simply measure pre- 
cipitation in the previous hour and average 
the precipitation for the previous week and 
previous month, and predict stream flow at a 
point. Processes such as evaporation or sub- 
surface flow are not explicitly represented, 
and the output may be a statistical function 
ofthe inputs. The model may be more accu- 
rate than a process-based approach, in that 
the predicted outflow at any point in the 
stream may be closer to measured values 
than those derived from a process-based 
model. Conversely, the output may be 
poorer, in that the measurements may be far- 
ther from predictions, especially under novel 
conditions. Process modelers argue that by 
‘incorporating the structure and function of 
the system into a process model we may bet- 
ter predict under new conditions, for exam- 
ple. for extreme drought or rainfall events 
never experienced before, They also argue. 
that process models aid in our understanding 
a system and in generating new hypotheses 
about system function. 

Besides being continuous or discrete 
and process or fit, models may be stochastic 
ог deterministic. А deterministic model pro- 
vides the same outputs every time it is given 
exactly the same inputs. If we enter the same 
set of variables into a model, it will always 
produce exactly the same results. A stochas- 
tic model will not. Stochastic models often 
have random generation or some other vari- 
ability generation procedures that change. 
‘model results from run to run, even when 
using exactly the same inputs. 

А disease spread process is a good 
example of a phenomenon that might be 
modeled with a stochastic process. Disease 
may occur at а set of locations, and may be 
spread through the atmosphere, spread 
through water, or carried by animals or 
‘humans to initiate new disease centers. A 
doctor might model disease infection and 
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growth stochastically. A random number. 
might be generated, and the new center 
started at a location based on this number. 
The doctor might use another totally or par- 
tially random process to control how the new 
infection center grows or disappears in the 
spatial model. Thus, the map of disease loca- 
tions after different model rans may differ, 
even though the runs were initiated with 
identical input conditions. 

With most spatial models, the target 
location of the model output is usually, but 
not always, the location of the inputs. For 
example, a demographics model may use a 
combination of current population in a cen- 
sus tract, housing availability and cost, job 
‘opportunities and location, general migra- 
tion statistics, and age and marital status of 
those currently in the census tract o predict. 
future population for the census tract. This 
model has a target location. the census tract, 
that is the same as the location for most of 
the input data, 

In contrast, the target location of the 
modal outputs uy ы ifie han бе 
location of the inputs. Consider a fire behav- 
ior model. This model might predict the 


location of a wildfire based on current fire 
location, wind speed, topography (fires bur 
faster upslope than down), and vegetation 
type and condition. Fire models often incor- 
porate mechanisms to predict fire spread 
beyond the current bum front of a fire. 
Embers often are lifted above a fire by the 
upwelling heated ай. These embers may be 
blown well in advance of a fire front, start- 
ing spot fires at some distance away from the 
main fire. In this case, the target location for 
ıa calculation in the spatial model is not the 
same as the input locations. 


Cell-Based Models 


Spatial-temporal models often are. 
implemented as cell-based models. А cell- 
based model invokes а set of functions and 
logic, driven by cell values, to update these 
‘or other cell values through time. Input val- 
ues ata starting time, t may be derived 
from multiple layers. These input valves are 
entered into functions that calculate the new- 
values for the target layer or layers at the 
next time step, t, The process із then 
repeated, and the values in the target layer(s) 
‘evolve through time (Figure 13-20). 
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The previous example of erosion due to 
surface runoff shows how cell-based model- 
ing can be extended beyond the static analy- 
sis of the USLE. Although there are many 
erosive forces, water is the primary cause of 
erosion over most of the globe. The amount 
of sol erosion depends on many factors, 
including the rainfall rate, how fast the rain- 
‘water is absorbed by the зой (permeability). 
the type of soil, the slope at the site, and bow. 
much water is flowing from uphill cells. 
Some of these properties do not usually 
change over a rainstorm, for example, slope 
‘or soil type, while other features do, such as 
rainfall rate and downflowing water. АП of 
these factors may be provided as cell-based 
layers, some that change with time, and 
some that are static. These layers are then 
included in an equation to calculate erosion 
at each cell location for a grid. Rainfall and 
flow rates may be updated at each step, and 
the resultant erosion calculated and placed in 
an output layer, as shown in Figure 13-20. 


Example 1: Process-Based 
Hydrologic Models 


‘Water flows downhill. This simple 
knowledge was perhaps sufficient until 
humans began to build houses and roads, 
and populations grew to dominate most of 
the Earth's land surface. Land scarcity has 
led humans to build in low-lying areas, and 
farming, wetland drainage, and upstream 
development have all contributed to more 
frequent and severe flooding. 

‘Water models are needed because 
demand for water resources exceed the natu- 
ral supply in many parts of the world. Popu- 
lation pressures have driven farms. cities, 
and other human developments into flood- 
prone areas; these same developments have 
increased the speed and amount of rainfall 
runoff, thereby increasing flood frequency. 
and severity. These factors are spurring the 
development of spatio-temporal hydrologic 
models. The models are often used to esti- 
‘mate stream water levels, such that we may 
better manage water resources and avoid 
loss of property or Ме due to flooding. 


Many spatio-temporal hydrologic mod- 
els predict the temporal fluctuations in soil 
moisture, lake or stream water levels, and 
discharge in hydrologic networks. The net- 
work typically consists of a set of connected 
rivers and streams, including impoundments 
such as lakes, ponds, and reservoirs (Figure. 
13-21), This network typically has a branch- 
ing pattem. As you move upstream from the 
main discharge point for the network, 
streams are smaller and carry less water. 
Water level or discharge may be important at 
fixed points in the hydrologic network, at 
fixed points on land near the network, or at 
all points in the landscape. The hydrologic 
network is often embedded in a watershed. 
defined as the area that contributes down- 
slope flow to the network. 

Spatially explicit hydrologic models are. 
almost universally dependent on digital ele- 
vation data, РЕМ» define watershed bound- 
aries, water flow paths, the speed of 
downslope movement, and stream location 


Figure 13-21: An example 
тейт Lakes and or rer fem an бао 


(Chapter 12). Slope, aspect, and other fac- 
tors that affect hydrologic systems may be 
derived from DEMs. For example, evapora- 
tion of surface water and transpiration of soil 
water depend on the amount of solar radia- 
tion, Site solar radiation depends on the 
slope and aspect at each point, and in moun- 
tainous terrain it may also depend on sur- 
rounding elevations, due to shading. Site- 
specific variables representing slope and 
aspect are used when estimating evaporation 
or plant use of water, 


Slope and aspect are often used to define 
ап important spatial data layer in hydrologic 
modeling flow direction. This layer 
defines the direction of water flow at import- 
аш points on the surface, Ifa raster data 


structure is used, flow direction is calculated. 
for every cell. Ifa vector data structure is 
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‘used, flow direction is defined between adja- 
‘cent or connected vector elements. 


Many hydrologic models represent 
water flow through raster grid cells (Figure 
13-22), Water falls on each cell via precipita- 
tion. Precipitation either infiltrates into the 
soil or flows across the surface, depending 
‘on the surface permeability at the cell. For 
‘example, little water infiltrates for most 
human-made surfaces, such as parking lots 
‘or buildings. These sites have low permea- 
bility, so most precipitation becomes surface 
flow. Conversely, nearly all precipitation 
infiltrates into most forest soils. 
Downslope water flow is also calcu- 
{ated in the model, depending on a number 
of factors at each cell, Slope and flow direc- 
tion determine the rate at which water flows 
downhill. Downslope flow eventually 
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reaches the hydrologic network and is routed 

the network to the outlet. Mathematical 
functions describing cell-specific precipita- 
tion, flows, and discharge may be combined 
to predict the flow quantity and water level 
at points in the watershed and through the. 
network. 

Spatio-temporal hydrologic models 
often require substantial data development. 
Elevation, surface and subsurface permea- 
bility, vegetation, and stream network loca- 
tion must be developed prior to the 
application of many hydrologic models. 
DEM data may require substantial extra edit- 
ing because terrain largely drives water 
movement. For example, local sinks occur 
much more frequently ш DEMS than in real 
surfaces, occuring during data collection or 
during processing. Sinks are particularly 
troublesome when they occur at the bottom 
ofa larger accumulation area. Modeled 

ter may flow into the sink but may not 
flow out, depending on how water accumu- 
lation is modeled. while on the real earth 
surface the water may flow freely downhill. 
Local spikes in the model may push water 
incorrectly to surrounding cells, although 
they typically cause fewer problems than 
sinks. Both sinks and spikes must be 
removed prior to application of some hydro- 
logic models 


Example 2: LANDIS, a Stochastic 
Model of Forest Change 


Many human and natural 
are analyzed through spatially explicit sto- 
chastic models. Disease spread, the develop- 
‘ment of past societies, animal movement, 
fire spread, and a host of other important 
spatial phenomena have been modeled. All 
these phenomena have a random element 
"hat substantially affects their behavior. 
Events too obscure or complex to predict 
‘may cause large changes in the system 
action or function. For example, wind speed 
‘or dryness on a given day dramatically 
affects fire spread. yet these phenomena are 
notoriously difficult to predict. Spatially 
explicit, stochastic models allow us to ana- 


Туге the relative importance of component. 
inputs and processes, and the nature and 
variability of system response. Is it stochas- 
tic variation in wind, fuel amount, or fuel 
type that is most responsible for the variable 
nature of fire spread? We will discuss one 
spatial stochastic model — LANDIS 
(LANdscape DiSturbance) — that incorpo- 
rates te used in a wide range of 
models Figure 13-23). 


Forest vegetation changes through time. 
Change may be caused by the natural aging 
and death of a group of trees, replacement by 
other species, or due to periodic disturbances 
such as fire, windstorms, logging. insects, or 
disease outbreaks. Because trees are long- 
lived organisms, the composition and struc- 
ture of forests often change on temporal 
scales exceeding a normal human life span. 
Actions today may substantially alter the tra- 
jectory of бише change, and so we need t0 
‘analyze how past actions have led to current 
forest conditions, and how present actions 
will alter бише conditions, 


Forest disturbance and change are 
important spatial phenomena for many rea- 
эю, Humans we чене! Ш родка 
‘wood and fiber, preserving rare species, pro- 
ecting clean water supplies and fish spawn- 
ing areas, protecting lives and property from 
wildfires, and enjoying forest-based recre- 
ation. 

Forest change is inherently a spatial 
phenomenon. Fires. diseases, and other dis- 
Turbances travel across space. The distribu- 
tion of current forests largely affects the 
location and species composition of future 
forests. Seeds disperse through space, aided 
by wind and water or carried by organisms. 
Physical and biotic characteristics that 


Some plants are better adapted to grow 
under existing forests, while others are aided 
by disturbances that open the canopy. Some 
species change soil or understory conditions 
in ways that prevent other species form 

‘growing beneath them. Plant succession, the 
replacement of one group of plants or spe- 
Чез by another through time, is substantially 
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affected by the current forest distribution. 
tnd suture Iis no surprising tiat many 
process-based models of forest change 
incorporate spatial data. 

While forests often look like a cohesive 
‘hole, they are extremely heterogeneous in 
space, and this complicates our understand- 
ing and predictions of forest change. Tree 
species, size, age, soils, water availability. 
and other factors change substantially over 
very short distances. Each forest stand is dif- 
ferent, and we struggle to represent these 
differences. Given the long time scales 
broad spatial scales and inherent spatial 
variability of forests, many organizations 
have developed spatial models integrated to 
some extent with GIS. 


LANDIS isan example of a spatially 
explicit, process-based forest dynamics 
model. LANDIS has been developed by Dr. 
David Mladenoff and colleagues, and has. 


been applied to forest biomes across the 
globe. LANDIS incorporates natural and 
human disturbances with models of seed dis- 
persal, plant establishment, and successi 
through time to predict forest composition 
‘over broad spatial scales and for its long 
temporal scales. LANDIS is notable for the 
broad areas it may treat at relatively high 
resolution, and long temporal scales 
LANDIS has been used fo model forest 
dynamics at а 30 m resolution, over tens of 
thousands of hectares, and across five cenm- 
ries. 

LANDIS integrates information about. 
forest disturbance and succession to predict 
‘changes in forest composition (Figure 13- 
23), As noted above, succession is the. 
replacement of species through time. Suc- 
cession із common in forests, for example, 
‘when fast-growing. light-demanding tree 
species colonize a disturbed site, and are in 
tum replaced by more shade-tolerant, slower 
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growing species. These shade-tolerant spe- 
cies may be self-replacing in that their seeds 
‘germinate and seedlings survive and grow in 
the dense shade. Small gaps from canopy 
damage or tree deaths allow small patches of 
light to reach these shade-tolerant seedlings, 
enabling them to reach the upper canopy. 
This self replacement can result in a stable, 
same-species stand over long time periods. 
This cycle may be broken due to fire, wind- 
throw, logging, or other disturbance events 
"hat open up a stand to a broader range of 
species. LANDIS simulates large heteroge- 
neous landscapes, incorporates the interac- 
tions of dominant ree species, and includes 
spatially explicit representations of ecologi- 
cal interactions. 


LANDIS Design Elements 


The design of LANDIS is driven by the 
overall objectives for the model, simulating 
forest disturbance and succession through 
time, LANDIS also satisfies a number of 
other requirements. LANDIS readily inte- 
grates satellite data sets and other appropri- 
ate spatial data, and it simulates the basic. 
processes of disturbance, stand develop- 
ment, seed dispersal, and succession in a 
spatially explicit manner. 

LANDIS is an object-oriented model. 
Specific features or processes are encapsu- 
lated in objects, and object-intemal pro- 
cesses are isolated as much as possible from 
other portions of the model. As an example, 
"here is a SPECIE object that encapsulates 
most of the important information and pro- 
cesses for each tree species included in the 
‘model. Each instance of a SPECIE has a 
name, for example, "Aspen." and other char- 
acteristics such as longevity shade toler- 
ance, or age to maturity, as well as methods 
for birth, death, and other actions or charac- 
teristics. Because these characteristics and 
processes are encapsulated in a SPECIE 
object, they may be easily changed as new 
knowledge become available. Many models 
are incorporating this object-oriented design, 
because it simplifies maintenance and modi- 
fications. 


LANDIS uses a raster data model that 
eases the entry of classified satellite imag- 
«гу, elevation, and other data sets reflecting 
short-range environmental and forest species 
variation. Interactions such as seed dispersal. 
competitions, and fir spread are explicitly 
modeled for each species occupying each 
grid cell 

LANDIS tracks the presence of age 
classes (termed cohorts) for a number of spe- 
Чез in each cell and through time. The 
‘model begins with an initial condition: the 
distribution of species by age class across 
the landscape. Ten-year age classes are cur- 
rently represented. The longevity, age of. 
tial seed production, seed dispersal distance, 
shade tolerance, fire tolerance, and ability to 
sprout from damaged stumps or roots is 
recorded for each species. On undisturbed 
sites, cohorts age until they reach their lon- 
gevity.at which point these older cohorts 
"die" and disappear from the сей. Younger 
cohorts may then appear, depending on the. 
availabilty of seed 

The spatially explicit representation of 
seed sources and dispersal is an improve- 
‘ment of LANDIS over many earlier forest 
succession models. Previous models typi- 
cally assumed constant or random seed 
availability. LANDIS is representative of 
spatially explicit models in that the specific 
locations of a process affect that process. 
Disturbed sites may be occupied by seed- 
lings from a disturbed cell or nearby cells, ог 
by sprouting from trees in a cell prior to dis- 
turbance. Cells cycle through the species 
establishment, succession, disturbance, and 
mortality processes (Figure 13-24). 

The effects of such site characteristics as 
зой and topography on species establish- 
‘ment and interactions are also represented in 
LANDIS. For example, establishment coef- 
ficients are used to represent the interaction 
between site characteristics and species 
establishment. These coefficients vary by 
land type. Fire severity also varies by land 
type, as may seedling survival. Elevation, 
aspect, soils, and other spatial data are used 
as input to the spatial model. 
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Figure 13-24: Basic spatial data and procewes represented in LANDIS, 


Fire and wind disturbances are simu- 
lated based on historical records of distur- 
bance sizes, Frequencies, and severities. 

Disturbances vary in these properties across 
the landscape. For example, wind distur- 

bances may be more frequent and severe on 
exposed ridges, and fires less frequent, less 
intense, and smaller in wetlands. Distur- 

bances are stochastically generated, but their 
variability depends on landscape variables. 
For example, fires are generated more fre- 
quently on dry upland sites. 

LANDIS has been applied to a number 
of fores science and management problems, 
including the effects of climate change on 
forest composition and production. the 
impacts of changing harvesting regimes on 
landscape pattems, and regional forest 
assessments (Figure 13-25). 

Hundreds of other spatially 
temporally dynamic models have been 


I. and many more are currently 
‘under development. As spatial data collec- 
tion technologies improve and GIS systems 
become more powerful, spatio-temporal 
models are becoming standard tools in geo- 
graphic science, planning, and in resource 
management. 


summary 

‘Spatial analysis often involves the 
development and use of spatial models. 
These models can help us understand how 
phenomena or systems change through space. 
and time, and they may be used to solve 
problems. In this chapter we described carto- 
graphic models. simple spatial models, and 
spatio-temporal models. 

‘Cartographic models often combine sev- 
eral data layers that represent criteria and 
‘constraints related to establishing the suit- 
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ability of locations for a given use. Data lay- 
ers are combined through the application of 
а sequence of spatial operations including. 
overlay, reclassification, and buffering, The 
cartographic model may be specified with a 
flowchart, а diagram representing the data 
layers and sequence of spatial operations. 
‘Cartographic models are static in time rela- 
tive to the other model types. 


Simple spatial models typically apply a 
set of equations to variables expressed as 
simple scalars and spatial layers. These 
models are often expressed as а set equa- 
tions, These equations estimated from data 
ata set of observations at single points or 
across sub areas, and then usually applied 
across broader geographic (or different) 
areas. 


Spatio-temporal models explicitly repre- 
sent the changes in important phenomena. 


through time within the model. These mod- 
els are typically more detailed, and less flex- 
‘ible than cartographic models, їп part 
because spatio-temporal models often. 
include some representation of process. For 
‘example, many spatio-temporal models lave. 
been developed to model the flow of water 
through a region, and these models incorpo- 
Tate equations regarding the physics of water. 
transport movement 

As more people use GIS, they gain 
access to spatial models run with some com- 
bination of GIS, programing languages, and 
other analytical tools. Many spatial models 
only use spatial data developed by GIS ог 
пзе GIS for visualization, but a growing 
number are either integrated with GIS or 
being done entirely in GIS as the technology 
improves. 
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Study Questions 


13.1 - Provide an example of a cartographic model, including the criteria and а flow- 
chart of the steps used to apply the model. 


13.2 - Why must the criteria be refined in many cartographic modeling processes? 
13.3 - What do we mean when we say that most cartographic models are temporally 
static? 


13.4- What а discrete vs. continuous weighing inan ори layer when combining 
layers ina cartographic overlay? How do you develop a reasonable continuous rank 
ing function, ati justify he shape of te curve vs the level of te input variable? 


13.5 - Match the output layers B. C, and D. to the appropriate reclassification graphs 
Q- W when applied to the original DEM, A. Note that а hillshade surface is superim- 
posed to aid in visualization 
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wou Layer 
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13.7- The figure below depicts four flowcharts of cartographic models to find areas 
most suitable for a new park. Sites are preferred that meet ай of the following crite- 


ria: 
Within 0.5 miles of Census polygons with а density of more than 120 persons per 
square mile; 

Greater than 0.5 miles from an existing park: 

Current land use of grass or vacant. 


Select the flowchart which best approximates the proper analysis, given the described 
dns Fah nc timo pne a ert et 
method. Note that some minor intermediate steps are omitted subsumed into 
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13.8- The figure below depicts four flowcharts of cartographic models to find areas 
‘most suitable for new parks. Sites are preferred that meet ай of the following criteria: 
‘Within 0.5 miles of Census polygons with a density of more than 120 persons per 


square mile; 

Greater than 0.5 miles from an existing park: 

Within 0.5 miles of a school. 

‘Select the flowchart that. 

data. For each other flowchart, list at least one primary way it is inferior to the chosen 
‘Note that some i steps. so do not 

Cite a step omitted in both the best and altemate. 
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13.9 - The figure below depicts four flowcharts of cartographic models to find areas 
most suitable for wild sheep habitat. Sites are preferred that meet all of the following 


Non-forest; Slope greater than 20%; Elevation above 2,500 m; All areas within 5 km 
of grassland or meadow; Each contiguous polygon larger than 1000 acres. 


. Note that some intermediate steps are omitted for all flowcharts, so do not 
cite a step omitted in both the best and alternate flowcharts. 
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13.10- The figure below depicts four flowcharts of cartographic models to find areas 
‘most suitable for a desalinization plant. Sites are preferred for which all pars meet all 


All areas within 0.5 km of trunk water pipe: 
Within single parcel: 
Areas larger than 100 ha. 


Select the flowchart that best approximates the proper analysis, given the described 
data For each other flowchart at least one primary way itis more likely to lead to 
an error, Noe ta some smal intermediate sep ме oie for al sodo 
not cite a step omitted in both the best and alternate flowcharts. 


ТОЛА А 


604 GIS Fundamentals 


1 4 Data Standards and Data 


Quality 


Introduction 


А standard is an established or sanc- 
tioned measure, form. or method. It is an 
agreed-upon way of doing something. Spa- 
tial data and analysis standards are import- 
ant because of the range of organizations 
producing and using spatial data, and 
because these data are often transferred 
among organizations. Data standards facili- 
tate a common understanding of the compo- 
nents of a spatial data set, how data were 
developed, and the utility and limitations of 
these data. 


GIS practitioners use several types of 
standards. Data standards эге used to for- 
mat, assess, document, and deliver spatial 
data, Interoperability standards identi 
how spatial data are served between hetero- 
geneous networks of software and hardware 
systems, for example, between wireless 
mobile devices and shared databases. Analy- 
sis standards ensure that the most appropri- 
ме methods are used and that the spatial 
analyses provide the best information possi- 
ble. Professional or certification standards 
establish the education, knowledge, or expe- 
rience ofthe GIS analyst, thereby improving 
the likelihood that the technology will be 
used appropriately. 

‘The Federal Geographic Data Commit- 
tee (FGDC) is the leading government orga- 
nization in the United States in defining data 
standards, The FGDC focuses on the 
National Spatial Data Infrastructure (NSDI) 


in the United States, a set of resources to aid 
the creation and sharing of digital geo- 
‘graphic data. Standards are developed 
through a set of processes, from proposals 
through drafts to a FGDC adopted standard. 
‘Standards may be modified through an 

te process. Currently, there are stan- 
Sanh abode og, Salant clita 
tion), content (Utilities Data Content 
Standard). metadata (data about data), and 
data transfer. The most recent NSDI plan at 
this writing covers the 2021-2024 period. 
Details are at www.fgdc gov. 

There are parallel initiatives in many 
countries, information on which can be 
found through the International Spatial Data 
‘Standards Commission. The Commission 
currently serves as a clearinghouse and gate- 
‘way to national standards across the world. 


The International Standards Organiza- 
tion (ISO) organizes international standards, 
and sponsors the ISO/TC211 standards 
(tt wrww.isote21 1 org). These specify 
‘ways to store and represent spatial and 
‘elated information, services and data man- 
‘agement, processing, transferring, and pre- 
senting information. The standards are 
organized as various projects, for example, 
standards for representing coordinates, test- 
ing standards or for measuring data quality. 
Many standards are in active development, 
and inasmuch as these standards become 
stable, it will ease data and information 
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transfer among different GIS software, 
‘organizations, and through time. 

‘The Open Geospatial Consortium 
(OGC) is an ad hoc, self-selected group of 
‘over 500 companies, research institutions, 
‘government bodies, and individuals dedi- 
cated to developing interoperability stan- 
dards. Interoperation problems are 

Jentified, such as general difficulties in 
accessing time-varying spatial location data 
through a distributed wireless network, and 
standards for access proposed. These are 
reviewed, discussed, amended, and adopted. 

‘Web mapping services (WMS) stan- 
dards are an example of OGC initiatives. 
Web mapping services allow GIS software 
to access data across the interet as they 
were stored on the local hard disk. A GIS 
progam or ity “maps” the WMS tothe 

local computer, meaning it may access the 
data with the same protocols as if it were 
stored locally, without downloading a per- 
manent copy to store on the local ard disk- 

Web services such as WMS are import- 
ant for the future of cloud-based computing. 
Where data, programs, and processing are 
seamlessly distributed on computers con- 
nected across the web. Cloud-based geospa- 
tial computing is inherently dependent on 
robust, well-defined interoperability stan- 
dards such as those being developed by the 
OGC. Standards identify data formats and 
Content, parts and naming. metadata, how 
‘connections are made and data are passed 
‘between programs across distributed net- 
‘works, and error checking in transfer. Stan- 
dards allow data to be combined across 
different organizations, with local storage 
and access form and protocols, and а stan- 
dard way of serving up data to others 
through a service. 

The Indoor GML is an example of a 
newer ООС standard, under development to 
define data formats for interior building spa- 
tial data. Three-dimensional data for build- 
ing interiors are useful to real estate, law 
enforcement, design, and construction appli- 
cations, among others. Various software ven- 
dors and research organizations have 


developed 3D formats, but data sharing is 
inhibited without standards. The OGC has 
developed such a data standard, with the par- 
ticipation of software, research, business, 
and government representatives. 

It has proven more difficult to develop 
professional and analysis standards that are 
Inclusive across all disciplines. Standard 
methods for one discipline may be inappro- 
priate for another. For example, acceptable 
data collection methods for cadastral survey- 
ors may be different than those for foresters, 
due то differences in accuracy and attribute 
requirements. 

The GIS Certification Institute (GISCT 
has developed a GIS Professional (GISP) 
certificate that is gaining popularity. It seeks 
to certify qualifying levels for education, 
professional experience, knowledge. and 
ethics, and help guide individual profes- 
sional development and continuing educa- 
tion. It offers exam-based qualification, 

There are parallel efforts to develop a 
set of standards in U.S. GIS professional 
practice. Known as y models, they 

а set of skills considered essential for 
аена 
developed for a growing number of indus- 
tries. All have а common foundation of basic 
personal and workplace competencies, with 
industry- and then occupancy-specifc skills 
built on top. 


The Geospatial Technology Com- 
petency Model 

The Geospatial Technology Compe- 
tency Model (Figure 14-1) identifies a set of 
core and industry sector geospatial abilities. 
The Competency Model identifies examples 
of over 40 "Critical Work Functions" that 
‘geographic technology professionals are 
‘commonly expected to master and use in 
their careers, and the background knowledge 
оп which these Critical Work Functions are 
based. This competency model is based in 
рап on the Geographic Infomation Science 

Body of Knowledge. first 
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Geographic Information Science (UCGIS) 
and published by the Association of Ameri- 
сап Geographers in 2006, Critical work 
functions include operations in basic geod- 
езу, data collection systems, data structures, 
GIS operation and programing. analytical 
methods, cartography, the place of geo- 
graphic information science and technology 
in society, and organization and institutions. 
A set of higher-level requirements are noted 
for specific occupations. 


Spatial Data Standards 


Sparial data standards can be defined 
аз methods for structuring, describing. and 
delivering spatially referenced data. Spatial 
data standards may be categorized into four. 
areas: media format standards, 
accuracy standards, and documentation stan- 
dards. All are important, although the last 


two are substantially more complex than the 
first two. 

Media standards refer to the physical 
form in which data are transferred. They 
define specific formats for CD-ROM, mag- 
netic tape, optical or solid state storage, or 
some proprietary drive or other media type. 
Standardized formats are specified by the. 
Intemational Standards Organization (ISO). 

Format standards specify data file com- 
ponents and structures. A format standard 
establishes the number of files used to store 
a spatial data set, as well as the basic compo- 
nents contained in each file, The order, size, 
‘and range of values for the data element con- 
tained in each file are defined. Information 
such as spacing. variable type, and file 
‘encoding may be included, 

Format standards aid in the practical 
task of transferring data between computer 
systems, either within or between organiza- 
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tions. Producers and users may not use the 
same hardware or GIS software. The inter 
change between different software systems 
is aided by general, standard forms in which 
data may be delivered. 


Many government or vendor formats 
have become widely supported because data 
are commonly delivered using these formats. 
For example, the U.S. government supports 
the Spatial Data Transfer Standard (5075), 
‘This format specifies the logic, format, and 
encoding for raster, vector, and topological 
data transfer of spatial data. ESRI shapefiles 
(a cluster of files including зір, shx. and 

dbf) are a commonly supported vector for- 
‘mat, and many organizations transfer data 
using them. These proprietary formats are. 
not truly standards because the formats may 
be changed by the vendors that created them. 
Until data formats are agreed to by a stan- 
dardizing body, ambiguity in form will hin- 
der interpretation and hence use. 

Spatial data accuracy standards docu- 
‘ment the quality of the positional and attri- 
bute values. Knowledge of data quality is 
crucial to the effective use of GIS, but we 
are often remiss in documenting spatial data 
‘quality. This is due in part to the cost of ade- 
quately estimating the errors in our spatial 
data sets. Field sampling is expensive. Data 
production is often pushed to available 


resources, and the documentation of data 
accuracy incomplete. Adherence to spatial 
data accuracy standards ensures we assess 
and communicate spatial data quality in a 
well-defined, established manner. 
Documentation standards define how 
же describe spatial data. Data are derived 
from aset of original measurements taken by 
specific individuals or organizations at a 
specified time. Data documentation stan- 
dards are an agreed-upon way of describing. 
the source, development, and form of spatial 
data. When documentation standards are 
used, they ensure a complete description of 
the data origin, methods of development, 
accuracy, and delivery formats. Standards 
allow any potential user to assess the appro- 
priateness of these data for an intended task. 
Data quality standards add value to our 
data. There are many ways to describe data 
positional and attribute error. An incomplete 
description of spatial data quality may not 
allow a user to judge ifthe data are accept- 
able for an intended application. А data 
quality standard becomes familiar through 
use, We may know what levels of average 
error are likely to result in unacceptable 
data. The standard allows us to compare two 
data sets in light of this past experience. 
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Data Accuracy 


An accurate observation reflects the true 
shape, location, or characteristics of the phe- 
nomena represented in a GIS. When the con- 
cept of accuracy is applied to spatial 
variables, it is a measure of how often or by 
how much our data values are in error. Accu- 
тасу may be reported as a frequency, for 
example, when we report that 20% of the 
land cover class labeled as cropland is actu- 
ally perennial grasses. Alternatively, accu- 
racy may be expressed as an average error 
magnitude; for example, light poles may be 
displaced on average by 12.4 m from their 
true locations. 

Tnadequacies in our spatial data model 
may cause spatial data error. When we use a 
raster data set with а fixed cell size, we have 
set a limit on our positional accuracy. The 
raster model assumes a homogeneous pixel. 
Tf more than one category or value for a vari- 
able is found in the pixel, then the attribute 
value may be in error. This generalization 
error may also occur in vector data sets. Any 
feature smaller than the minimum mapping 
unit may not be represented. Vector data sets 
may poorly represent gradual changes, зо 
there can be increased attribute error near 
Vector boundaries. Digital soils data are 
often provided in a vector data model, yet 
the boundaries between soil types are often 
not discrete, but change over a zone of a few 
to several meters. 

Errors are often introduced during spa- 
tial data collection. Many positional data are 
currently collected using GNSS technolo- 
gies. The spatial uncertainty in GNSS posi- 
tions described in Chapter 5 is incorporated 
into the positional data. Feature locations 
derived from digitized maps or aerial photo- 
graphs also contain positional errors due to 
Optical. mechanical, and human deficiencies. 
Lenses, cameras, or scanners may distort 
images, positional errors may be introduced 
during registration, or errors may be part of 
the digitization process. Blunders, fatigue, or 
differences among operators in abilities or 
attitudes may result in positional uncertainty. 


‘Spatial data accuracy may be degraded 
during laboratory processing or data reduc- 
tion. Mis-copies during the transcription of 
field notes, errors during keyboard entry, or 
mistakes during data manipulation may alter 
‘coordinate values used to represent a spatial 
data feature. Improper representation in the. 
‘computer may cause problems, such as 
rounding errors when multiplying large 
numbers. 


Data may also be in error due to changes 
through time (Figure 14-2). The world is 
‘dynamic, while our representation in a spa- 
tial data set captures a snapshot at the time of 
data collection. Vegetation boundaries may 
be altered by fire, logging. construction, 
‘conversion to agriculture, or a host of other. 
human or natural disturbances, Even in. 
instances where positions аге static, atri- 
butes may change through time. A two-lane 
gravel road may be paved or widened, caus- 
ing attributes о be in error. Layers should 
have a recommended update interval that 
may vary by type. Elevation, geology, and 
soils may be updated rarely, and still main- 
tain their accuracy. Vegetation, population, 
land use, or other factors change at faster 
rates, and should be updated more frequently 
if they are to remain accurate. 


Documenting Spatial Data Accu- 
racy 

‘We must unambiguously identify true 
conditions if we are to document spatial data 
accuracy. For example, a road segment may 
be completely paved, or not. The data record 
for that road segment is accurate if it 
describes the surface correctly, and inaccu- 
rate if it does not. However, in many cases, 
the truth is not completely known. The loca- 
tions forthe above roads may be precisely. 
surveyed using the latest carrier phase GNSS 
methods. Road centerlines and intersections 
may be known to the nearest 0.5 cm. While 
this is avery small error, this represents 
some ambiguity in what we deem to be the 
truth. Establishing the accuracy of a ata set 
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requires we know the accuracy of our mea- 
sure of truth. 


In most cases, the truth is defined based 
‘on some independent, higher order measure- 
‘ments, In our roads example, we may desire 
"hat our data layer be accurate to 15 m or 
better. бадей on this scale, the 0.5 cm accu- 
тасу from our carrier phase GNSS measure- 
ment may be considered true. 


Accuracy is most reliably determined by 
а comparison of true values to the valves 
represented in a spatial data set. This 
requires we collect data at an adequate set of 
sample locations. True values are collected 
at these sample locations. Corresponding 
values are collected for the digital spatial 
data. The true and data values are compared, 
errors calculated, and summary statistics 
generated. 

‘The source for our truth, the sampling 
method. our method for calculating error, 
and the summary statistics we choose will 
depend on the type of spatial data that are to 
be evaluated. Positional data will be 
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assessed using different methods than atri 
bute data. Nominal attribute data (e.g, the 
type of land cover). will be assessed differ- 
ently than a measurement recorded on a con- 
tinuous range (eg. purchase price of a 
parcel) 

There are four primary ways we 
describe spatial data accuracy: positional 
accuracy, attribute accuracy, logical consis- 
tency, and completeness (Figure 14-3). 
These four components may be comple- 
‘mented with information on the lineage of a 
data set to define the accuracy and quality of 
а data set. These components are described 
in tum below. 

Positional accuracy describes how close 
the locations of objects represented in a digi- 
tal data set correspond to the trae locations 
for the real-world entities. In practice, truth 
is determined from some higher-order posi- 


Attribute accuracies are usually reported as а 
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û) Positional accuracy 


b) Attribute accuracy 


error for attributes measured on interval/ 


ratio scales, and as percentages or propor- 
tions accurate for ordinal or categorical anri- 
Its. 


Logical consistency reflects the pres- 
ence, absence, ог frequency of inconsistent 
data, Tests for logical consistency often 
require comparisons among themes, for 
example, all roads occur on dry land. This is 
different than positional accuracy in that 
both the road and the lake locations may 
contain positional error. However, these 
errors do not cause impossible or illogical 
juxtapositions. Logical consistency may also 
be applied to attributes, for example, wet- 
land soils erroneously liste as suitable for 
construction, or lakes with zero depth. 
Completeness describes how well the 
data set captures all the features. buildings 


shows аре features lines or points) ov 


ма layer may omit certain structures, and 
the frequency of these omissions reflects an 
incomplete data set. 


Data sets may be incomplete because of 
generalizations during map production or. 
digitizing. Рог a minimum map- 


ping unit may be set at 2 ha when compiling 
a vegetation map. Isolated small pastures 
scattered through the forest may not be rep- 
resented because they are only slightly larger 
than this minimum mapping unit, and erro- 
neously they are not represented in the data 
layer 


Lineage describes the sources, methods, 
timing. and persons responsible for the 
depen of a data шаар: 
establish bounds on the other measures of 
accuracy described above, because knowl- 
‘edge about certain primary data sources 
helps define the accuracy of a data set. 
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Positional Accuracy 


Positional accuracy measures how close 
а database representation of an object is to 
the true value. Accurate postions have small 
errors. Small is defined subjectively, but 
may at least be quantified. 

Precision refers to the consistency of a 
‘measurement method. Precision is usually 
defined in terms of how dispersed a set of 
терем measurements are from the average 
‘measurement. A precise measurement sys- 
tem provides tightly packed results. Precise. 
digitizing means we may repel pce a 
point in the same location. 

Accuracy and precision are often con- 
fused, but they are two different characteris- 
ties, both desirable, that may change 
independently. А set of measurements may 
be precise but inaccurate. Repeat measure- 
‘ments may be well clustered, meaning they 
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are precise, but they may not be near the true 
value, meaning they are inaccurate. A bias 
may exist, defined asa systematic offset in 
coordinate values. A less precise process 
will result in a set of points that are more 
widely spread. However, their average error 
may be substantially less, therefore, the set is 
‘more accurate, 

Figure 14-4 illustrates the difference 
between accuracy and precision. Four digi- 
tizing sessions are shown. The goal isto 
place several points at the center of the clo- 
‘erlea intersection in Figure 14-4. The 
upper left panel shows a digitizing process 
"hat is both accurate and precise. Points, 
shown as light-colored circles, аге clustered. 
tightly and accurately over the intended 

tion. 


The upper right panel of Figure 14-4 
NT rn 
(tightly clustered). but not accurately 


to represent the center. 
ef) or precise, but not 
pac 
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located. This might be due to an equipment 
failure or some problem in registration: the 
operator may have made some blunder in 
photo registration and introduced a bias. 


The lower left panel of Figure 14-4 
shows points that are accurately but impre- 
cisely digitized. The average location for 
these points is quite near the desired posi- 
tion, the center of the cloverlea intersection, 
even though individual points are widely 
scattered. These points are not very close to 
the mean value and so precision is low, even 
though accuracy is high. 

The panel at the lower right of Figure 
14-4 shows points with positions that are 
both imprecise and inaccurate. The mean 
value is not near the true location, nor are the 
values tightly clustered. 


The thresholds that constitute high accu- 
тасу or precision are often subjectively 
defined. A duffer may consider as accurate 
any golf shot that lands on the green. This 
definition of accuracy may be based on thou- 
sands of previous attempts. For a profes- 
sional golfer, anything farther than 2 m from 
the hole may be an inaccurate shot. In а sim- 
ilar fashion, the spatial accuracy sought by a 
land surveyor may be different than those of 
a federal land manager. Cadastral surveys 
require the utmost in accuracy because peo- 
ple tend to get upset when there is material 
permanent trespass, as when a neighbor 
builds a garage on their land. Lower accu- 
тасу is acceptable in other applications: for 
example а statewide map defining vegeta- 
tion type may be acceptable even though 
boundaries are off by tens of meters. 


‘The mean error and an error frequency 
threshold are the statistics most often used to 
document positional data accuracy. Consider. 
a set of wells represented as point features in 
а spatial data layer. Suppose that after we 
have digitized our well locations, we gain 
access to a GNSS system that effectively 
gives us the trie coordinate locations for 
each well. We may then compare these well 
locations to the coordinate locations in our 
database. We begin by calculating the dis- 
tance between our true and database coordi- 


nates foreach well. This leaves us with a list 
of errors, one associated with each well loca- 
tion (Figure 14-5). Distance is measured 
using the Pythagorean formula with the true 
and database coordinates. Distances are 
always positive because of the form ofthe 
formula. 


‘We may compute the mean error by 
summing the errors and dividing the sum by 
the number of observations. This gives us 
‘our average error, a useful statistic some- 
where near the midpoint of our errors. We 
are often interested in the distribution of our 
errors, and so we also commonly use a fre- 
‘quency histogram to summarize our spatial 
error. The histogram іза graph of the num- 
ber of error observations by a range of eror. 
values, for example, the number of error val- 
ues between 0 and I, between 1 and 2, 
between 2 and 3, and so on for all ош obser- 
vations. The graph will indicate the largest 
and smallest errors, and also give some indi- 
‘cation of the mean and most common errors. 
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Examples of error frequency distribu- 
tions for two different data sets are shown in. 
Figure 14-6. Each plot shows the frequency 
of errors across а range of error distances. 
For example, the top graph shows that 
approximately | percent of errors have a 

value of near 1.5 ш. and the mean error is 
pen 


The mean error value does not indicate. 
the distribution, or spread of the errors. Two- 
data sets may have the same mean error but 
‘one may be inferior; the data set may have 
‘more large errors. The bottom graph in Fig- 
‘ure 14-6 has the same mean error, 13 m. as 
the top graph. Note that the errors have a 
narrower distribution, meaning the errors are. 
‘lumped closer to the mean than in the top 
graph, and there аге fewer large errors. 

Although the mean error is the same, many 
‘would consider the data represented in the 
bottom graph of Figure 14-6 to be more 
accurate, 
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‘Because the mean statistic alone does 
not provide information on the distribution 
of positional errors, an error frequency 
threshold can be reported. An error fre- 
quency threshold is a value above or below 
‘hich a proportion ofthe error observations 
occurs Figure 14-6 shows the 95% fre- 
quency threshold fortwo error distributions. 
The threshold is placed such that 95% of the 
errors are smaller than the threshold and 5% 
are larger than the threshold. The top graph 
shows a 95% frequency threshold of approx- 
imately 21.8 m. This indicates that approxi- 
mately 95% of the positions tested from a 
sample of a spatial database are less than or 
equal to 21.8 m from the true locations. The 
bottom panel in Figure 14-6 has a 95% fre- 
quency threshold at 17.6 m. This means 5% 
ofthe errors in the second tested database 
are larger than 17.6 m from their re loca- 
tion. If we are concemed with the frequency 
oflarge errors, this may be a better summary 
statistic than the mean error, 


Accuracy Calculations 


The calculation of point accuracies and 
summary statistics are the next steps in accu- 
тасу assessment. First, the coordinates of 
both the true and data layer positions for a 
feature are recorded, These coordinates are 
used to calculate a positional difference, 
known аза positional error, based on the dis- 
tance between the true coordinates and the 
data layer coordinates (Figure 14-5). The 
true coordinates fall in a different location 
than the coordinates derived from the data 
layer Each test point yields an error distance. 
е, shown in Figure 14-5 and defined by the 
equation: 


dex cy 


where x, у, are true coordinates and xs, ya 
are the data layer coordinates for a point. 

The squared error differences are then. 
calculated, and the sum, average, and root 
‘mean square error (RMSE) statistics deter- 


ay 
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mined for the data set. As previously defined 
in this book, the RMSE is: 


RMSE= lod 


where e is defined as in equation 14.1, and n 
is the number of test points used. 

The RMSE is not the same as the aver- 
age distance error, nora "typical" distance. 
error. The RMSE is a statistic that is useful 
in determining probability thresholds for 
error. The RMSE is related to the statistical 
variance of the positional error. If we assume 
the x and y errors follow a bell-shaped 
Gaussian curve commonly observed when 
sampling, then the RMSE tells us something 
about the distribution of distance errors. We 
can use knowledge about the RMSE that we 


get from our sample to determine what is the. 
likelihood of a large or small error. A large 
RMSE means the errors аге widely spread, 
and a small RMSE means the errors are 
packed tightly around the mean value. 


Figure 14-7 shows an example set of test 
points for road data layer, and an image 
backdrop. Prior knowledge leads us to 
‘expect average errors in excess of 6 m. In 
this example, we have selected high resolu- 
tion GNSS as our tue data source, We know 
these photographs have a positional error of 
less than 15 cm, on average, from metadata, 
‘These images were selected because they 
meet our accuracy requirements and are 
available for the entire work area. 


The display of road locations on top of 
the images shows there are substantial dif- 
ferences in true positions of features and 
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their representations in the roads data layer. 
Апу rightangle intersection is a prospective 
test point. 

The inset in the lower right of Figure 14- 
7 shows the true point locations relative to 
the road intersections, Road centrlines were. 
digitized. These true locations would be 
identified on the images, perhaps by point- 
ing a cursor at a georeferenced image dis- 
played on a computer monitor. The data 
Coordinates would then be extracted for the 
corresponding road intersection, and these. 
{wo coordinate pairs, the true X. Y and data 
X Y, would be one test point used in accu- 
тасу calculations. 

Test points must be clearly identifiable 
in both the test data set and in the truth data 
set. Points that are precisely, unambiguously 
defined are best For example, we may wish 
to document the accuracy of roads data com- 
piled from medium-scale sources and repre- 
sented by a single line in a digital layer. 
Right-angle road intersections are preferred 
‘over other features because the positions 
represented in the database may be precisely 
determined. The coordinates for the precise 
center of the road intersections may also be 
determined from a higher-aceuracy data set, 
for example, from digital orthophotographs 
or field surveys. Other road features are less 
appropriate for test points, including road 

ersections at obtuse angles or acute 
curves, because there may be substantial 
‘uncertainty when matching the data layer to 
true coordinates. 

The source of the true coordinate posi- 
tion should match our minimum accuracy 
specification, or at least an order of magni- 
tude more accurate than the errors. GNSS 
ме a common source of truth, as the accu- 
тасу may be set by collection equipment and 
methods, but any source of truth that 
‘matches our requirements is acceptable. 


Positional Accuracy Standards 


There are three commonly used stan- 
dards for map and related digital data posi- 
tional accuracy in North America, and 
adopted in much ofthe World. The first is 
known as the National Map Accuracy Stan- 
dard (NMAS) and was developed by the 
U.S. Geological Survey to specify positional 
accuracy on hardcopy maps (ASPRS 1990). 
specified a 130th of an inch or larger error. 
оп no more than 10% of test points for maps 
ага scale of 1:20.000 or larger, which wans- 
lates to a 42 m (13.9 fi) threshold on a 
1:5,000 scale map. Given current dominance 
of digital cartography, the standard might be 
best as applying о spatial errors 
at the scale of digital aerial images used in a 
data collection effort, but even then, the 
errors are quite large relative to current tech- 
nological capabilities, so this standard has 
linde relevance except when producing hard- 
copy materials. 


The American Society of Photogramme- 
чу and Remote Sensing has developed a 
standard for digital orhoimagery and eleva- 
ns derived tie on (ASPRS 2015). The 
standard specifies methods for report 
cca чыту oA teed oa eee 
sured RMSE. An x ory error threshold may 
De established, ер. the RMSE must be less 
than 5 cm for a given orthophoto mosaic, 
and assuming normal error distribution. this 
threshold multiplied by 1.414 as a radial 
error (e.g.,7.1 cm. ог1 414°). or multiplied 
by 2.448 for specification at a 95% horizon- 
tal accuracy confidence interval (e.g 12.2 
cm. from 2.448" 5). И also establishes report- 
ing requirements for vertical accuracy. with 
different standards for vegetated and non- 
vegetated areas. reflecting the greater diffi- 
шу in vertical measurement for images or 
lidar where vegetation obscures the ground. 
The standard specifies accuracy testing 
and reporting guidelines, most notably on 
the number and distribution of check (test) 
points to acceptably quantify errors. Check- 
Point numbers increase with project area, 
‘vith 20 check points required for projects 
‘encompassing less than 500 km, growing to 
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{60 check points for projects up to 2.500 kan? 
Various numbers of vertical accuracy check 
points are required depending on vegetation 
presence. Checkpoint positional accuracy is 
required to be at least three times better than 
the accuracy of the dataset being tested. 

‘The Federal Geographic Data Commit- 
tee of the United States (FGDC) has 
described a third standard for measuring and 
reporting positional error, appropriate for 
vector oc raster feature data sets as well аз 
for image data. It is known as the National 
Standard for Spatial Data Accuracy 
(NSSDA). The NSSDA is also based on 
RMSE. It specifies the number and distribu- 
tion of sample points when performing an 
accuracy assessment, and prescribes the sta- 
tistical methods used to summarize and 
report positional error. Separate methods are 
described for horizontal (X and y) accuracy 
assessment and vertical (2) accuracy assess- 
ment, although the methods differ primarily. 
in the calculation of summary accuracy sta- 
tistics. There are five steps in applying the 
horizontal NSSDA: 

‘Identify a set of test points from the 
digital data set under scrutiny: 
“Identify a data set or method from 
Which "true" values will be determined: 
Collect positional measurements from. 
the test points as they are recorded in 
the test and “true” data sets: 

Calculate the positional error for each. 
test point and summarize the positional 
accuracy for the test data set in a stan- 
dard accuracy statistic: 

“Record the accuracy statistic in a stan- 
dardized form. Also include a descrip- 
tion of the sample number. true data set, 
the accuracy of the true data set, and the 
methods used to develop and assess the 
accuracy of the true data set. 

NSSDA horizontal calculations are 
summarized ina standard table (Table 14-1). 
This shows a positional accuracy assessment 
based on a set of 22 points. Data for each 
point are organized in rows. The true and 
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data layer coordinates are listed, as well as 
the difference and difference squared for 
both the x and y coordinate directions. The 
squared differences are summed, averaged, 
and the RMSE calculated, as shown in the. 

boxes in the lower right portion of 
Table 14-1. The RMSE is multiplied by 
1.7308 to estimate the 95% accuracy level, 
listed as the NSSDA. Ninety-five percent of 
the time, the true horizontal errors for this 
data set are expected to be less than the esti- 
mated accuracy level of 12.9 m listed in 
Table 14-1. 

‘The 1.7308 factor comes from assump- 
tions about the statistical properties of X and 
Y errors. If our two variables, x and y, are 
uncorrelated and follow a Gaussian distribu- 
tion, statistical theory tells us that 95% of 
ош errors are expected to be less than or 
‘equal to the threshold. A thorough treatment 
‘of the statistical foundation may be found in 
the references listed at the end ofthis chap- 
ter 


‘The NSSDA specifies between 20 and 
30 well-distributed test points (Figure 14-8). 
Test points should be distributed as evenly as 
possible throughout the data layer to be 
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Table 14-1: An accuracy assessment summary table. 
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tested. Each quadrant of the tested data layer 
should contain at least 20% of the test 
aay ae es eee eel 
Closer than one-tenth the longest. 

ашке for the tested data layer (û, in Fig- 
ure 14-8), 
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Errors in Linear or Area Features 


‘The NSSDA as described above treats 
only the accuracies of point locations. It is 
based on a probabilistic view of point Joca- 
tions. We are not sure where each point is; 
however, we can specify an error distance г 
Tor a set of features, А circle of radius r cen- 
tered on a point feature in our spatial data 
layer will include the true point location 
95% of the time. Unfortunately, there are no 
established standards for describing the 
accuracy or error of linear or area features. 
In some instances, we may assume the. 
well-defined point features described in our 
accuracy test above may also represent the 
accuracy for vertices of lines in а data layer. 
Vertices may be used as test points, 
they are well defined and the true coordi- 
nates are known, However, the errors at 
intervening locations are not known. for 
example, midway along a line segment 
between two vertices. The error along a 
straight line segment may be at most equal to 
the largest error observed at the ends of the 
line segments (Figure 14-9). If the data line. 
Segment is parallel to the true line segment, 
then the errors are uniform along the full 
length of the segment. Vertices that result in 
converging or crossing lines will lead to 
‘midpoint errors less than the larger of the 
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two errors at the endpoints (Figure 14-9), 
These observations are not true if a straight 
line segment is used to approximate a sub- 
stantially curved line. However, if the line 
segments are sufficiently short (e... the 
interval along the line is small relative to the 
radius of a curve in the line), and the posi- 
tional errors are distributed evenly on both 
sides of the line segments, then the NSSDA 
methods described above will provide an 
approximate upper limit on the linear error. 


Attribute Accuracy 


Unlike positional accuracy, there is no 
national standard for measuring and report- 
ing attribute accuracy. Accuracy for contim- 
‘ous variables may be calculated in an 
analogous manner to positional accuracy. 
Accuracy for each observation is defined as 
the difference between the true and database 
values. A set of test data points may be iden- 
tified, the true attribute value determined for 
‘each of those test data points, the difference. 
‘calculated for each test point, and the accu- 
racy summarized. 


The accuracy of categorical atribute 
data may be summarized using an error 
table and associated accuracy statistics. 
Points can be classified as correct, that is, the 
‘categorical variable matches the true cate- 
gory fora feature, or they may be incorrect. 
Incorrect observations occur when the true 
and layer category values are different. Error 
tables, also known as error matrices, confu- 
sion matrices, and accuracy tables, are a 
standard method of reporting eror in classi- 
fied remotely sensed imagery. They have 
more rarely been used for categorical 

bute accuracy assessment. 


An error table summarizes a two-way 
classification for a set of test points (Figure. 
14-10). А categorical variable will have a 

fixed number of categories. These categories 
are listed across the columns and along the 
rows of the error table. Each test feature is 
tallied in the error table. The true category 
and the value in the data layer are known for 
each test feature. The test feature is tallied in 
the error table based on these values. The 
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true values are entered via the appropriate 
column and the data layer values are entered 
via the appropriate row. The table is square, 
because there is the same number of catego- 
ries in both the rows and columns. Correctly 
classified features are tallied on the diagonal 
= е true value and data layer value are 
identical so they are noted at the inersec- 
tion of the categories. Incorrectly assigned 
category values fall off the diagonal. 

Error tables summarize the main charac- 
teristics of confusion among categories. The 
diagonal elements contain the test features 
"hat are correctly categorized. The diagonal 
‘sum is the total number correct. The propor- 
чоп correct is the total number correct 
divided by the total number tested. The per- 
Cent correct can be obtained by multiplying 


Per category accuracy may be extracted 
from the erro table. Two types of accuracy 
may be calculated, а user's accuracy and а 

^s accuracy. The user relies on the 
data layer to determine the category for a 
feature. The user is most often intrested in 
how often a feature is mislabeled for each 
category. In effect, the user wants to know 
how many features that are classified as a 
category (the row total) are truly from that 
category (the diagonal element for that row). 
Thus, the user's accuracy is defined as the 
number of correctly assigned features (the 
diagonal element) divided by the row total 
for the category. The t, on the other 
hand, knows the true identity of each feature 
and is often interested in how often these 
features are assigned to the correct category. 


The producer's accuracy is defined as 
экини diagonal element divided by the column 
юш 
true value 
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Error Propagation in Spatial Anal- 
ysis 


While we have discussed methods for 
assessing positional and attribute accuracy, 
we have not described how we determine the 
effects of input errors on the accuracy of 
spatial operations. Clearly, input error 
affects output values in most calculations. A 
large elevation error in DEM cells will likely 
cause errors in slope values. If slope is then 
combined with other features from other 
data layers, these errors may in tum propa- 
gate through the analysis. How do we assess 
the propagation of errors and their impacts 
оп spatial analysis? 

‘There are currently no widely applied, 
general methods for assessing the effect of 
positional errors on spatial models. Research 
is currently directed at several promising 
avenues: however, the range of variables and 
conditions involved has confounded the. 
development of general methods for assess- 
ing the impacts of purely positional errors on 
spatial models. 


Several approaches have been devel- 
‘oped to estimate the impacts of attribute 
епог on spatial models. One approach 
involves assessing errors inthe final result 
irrespective of errors in the original data. For. 
example, we may develop a cartographic 
toads cola deer dco атри 
environment. The model may depend on the 
density of housing, forest location, type, and 
емеш, the location of wetlands, and road 
location and traffic volumes. Each of these 
data sources may contain positional and 
attribute errors. 


Questions may arise regarding how 
these errors in our input data affect tbe 
model predictions for deer density. Rather 
than trying to identify how errors in the input 
propagate through to affect the final model 
results, we may opt to perform an error 
assessment of our final output. We would 
perform a field survey of deer density and 
compare the values predicted by the model 
with the values observed in the field. For 
example, we might subdivide the stady area 
into mutually exclusive census areas. Deer 


might be counted in each census area and the 
density calculated. We have replicated val- 
ues from each census area, so we may calcu- 
late a mean and a variance, and the 
difference between modeled and observed 
values might be compared relative to the nat 
‘ual variation we observe among different 
‘census areas. We could also survey an area 
through time, for example, on successive 
days. months, or years, and compare the dif- 
ference between ће model and observed 
values for each sample time. 


It may not be possible or desirable to 
wait to assess accuracy until after complet- 
ing a spatial analysis. Input data for a spe- 
cific spatial analysis may be expensive to 
collect. We may not wish to develop the data 
and a spatial model if errors in the input pre- 
clude a useful output. After model applica- 
tion, we may wish to identify the source of 
errors in our final predictions. Improvements 
in one or two data layers may substantially 
improve the quality of our predictions; for 
example, better data on forest cover may 
increase the accuracy of our deer density 
predictions. 

Error propagation in spatial models is 
‘often investigated with repeated model uns. 
We may employ some sort of repeat simula- 
tion model that adds error to data layers and 
records the impacts on model accuracy. 
These simulation models often employ a 
standard form known as a Monte Carlo sin- 
ulation. The Monte Carlo method assumes 
each input spatial value is derived from а 
population of values. For example, land- 
cover may range over a set of values foreach 
cell. Further, model coefficients may also be 
altered over a range. In a cartographic 
model, the weights are allowed to range over 
a specified interval when layers are com- 
bined. 

А Monte Carlo simulation controls how 
these input data or model parameters are 
allowed to vary. Typically, a random normal 
distribution is assumed for continuous input 
values. If all variables save one are held con- 
stant, and several model runs performed on 
different random selections of the variable, 
we may get an indication of bow a variable 
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affects the model output. We may find that 
‘the spatial model is insensitive to large 
changes in most of our input data values, but 
sensitive to small changes in a few. For 
‘example, predicted deer density may not 
change much even when landcover varies 
over a wide range of values, but may depend 
heavily on housing density. However, we 
may also find a set of input data, or a range. 
of input data or coefficients, that substan- 
tially control model output 

‘A Monte Carlo or similar simulation is a 
‘computationally intensive technique. Thou- 
sands of model runs are often required over 
each of the component units of the spatial 
domain. The computational burden increases 
аз the models become more involved, and as 
the number of spatial units increases. How- 
ever, itis often the only practical way with 
which to assess the impacts of uncertainties 
‘on spatial analyses, uncertainties both in the 


Summary 


Data standards, data accuracy assess- 
‘ment, and data documentation are among the 
most important activities in GIS. We cannot 
effectively use spatial data if we do not 
know its quality, and the efficient distribu- 
tion of spatial data depends on a common 
‘understanding of data content. 

Data may be inaccurate due to several 
causes. Data may be out of date, collected 
sing improper methods or equipment, or 
collected by unskilled ог inattentive persons. 

Accuracy is a measure of error. a differ- 
‘ence between a true and represented value. 


Inaccuracies may be reported using many 
‘methods, including a mean value, a fre- 
quency distribution, ога threshold value. An 
accuracy assessment or measurement applies 
only to a specific data set and time. 


Accuracy should be recognized as dis- 
tinct from precision. Precision is a measure 
of the repeatability of a process. Imprecise 
data collection often leads to poor accuracy. 

Standards have been developed for. 
assessing positional accuracy. Accuracy 
assessment and reporting depend on sam- 
pling. A set of features is visited in the field, 
and the true values collected. These true val- 
wes are then compared to corresponding val- 
тез stored in a data layer, and the differences 
between true and database values quantified. 
An adequate number of well-distributed 
samples should be collected. Standard work- 
sheets and statistics have been developed. 

Data documentation standards have 
been developed in the United States. These 
standards, developed by the Federal Geo- 

Data Committee, are known as the 
Content Standard for Digital Geospatial 
Metadata. This standard identifies specific. 
information that is required to fully describe 
a spatial data set. 
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‘Study Questions 
14.1 - Why are standards so important in spatial data? 


142 - Can you describe processes or activities that are greatly helped by the exis- 
tence of standards? 


14.3 - What are the differences between accuracy and precision? 


144 - How do mean and frequency thresholds differ in the way they report positional 
error? 


14.5 - What are some of the primary causes of positional error in spatial data? 

14.6 - Describe each of the following concepts with reference to documenting spatial 
data accuracy: positional accuracy, attribute accuracy, logical consistency, and com- 
pleteness. 

14.7 - What is the NSSDA, and how does it help us measure positional accuracy? 
14.8 - What are the basic steps in applying the NSSDA? 


14.9 - What are the constraints on the distribution of sample points under the 
NSSDA, and why are these constraints specified? 


14.10 - What are good candidate sources for test points in assessing the accuracy of a 
spatial data layer? 


14.1 - How are errors in nominal attribute data often reported? 
14.12 - What are metadata, and why are they important? 
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1 5 New Developments in GIS 


Introduction 


Аз every economist, weather forecaster, 
‘or politician knows, predicting the future is 
fraught with peril. Near-term predictions 
may be safe; if times are good now, they will 
probably be good next month. However, the 
farther one reaches into the future. the more 
likely they will be wrong. This chapter 
describes technologies that may become 
‘widespread. It discusses future trends, with 
the expectation that many of these specula- 
tions will not wholly come true. 

Many changes in GIS are based on 
advances in computers and ther electronic 
hardware. Computers are becoming smaller 
and less expensive, This is true for both gen- 
eral purpose machines and for specialized 
computers, such as ruggedized, portable tab- 
ler computers. The wizards of semiconduc- 
tors contine to dream up and then produce 
impossibly clever devices. Given current 
trends, we should not be surprised inthe 
future if a pea-sized device holds all the pub- 
lished works of humankind. Computers may 
gain personalities, recognize us as individu- 
als, respond entirely to voice commands, 
and routinely conjure thee-dimensional 
images that Пон in space before ош eyes. 
These and other developments will alter how 
же manipulate spatial data. 

Changes in GIS will also be due to the 
growing ubiquity of high speed. wireless 


‘connections. If ош data are always avail- 
able, we will interact with them differently. 
We can more easily see how things should 
‘bein the field, and compare them to how 
they are, for example, a wiring diagram for a 
roadside telephone interchange panel, ога 
building site plan vs, takeout, Records may 
be available at any time, everywhere, offer- 
ing instant access to an agricultural field's 
fertilization history, а water main's flange 
size, or bridge's inspection records, improv- 
ing decisions in the field and streamlining 
data management. 

Change is also due to increased sophisti- 
cation in GIS software and users, and. 
increased familiarity and standardization, 
‘Change will be driven by new algorithms or 
methods, for example, improved data com- 
pression techniques that speed the retrieval 
and improve the quality of digital images. 
Specialized software packages may be 
crafted that fur a multi-day. technically 
‘complicated operation into a few mouse 
clicks. These new tools will be introduced as 
GIS technologies continue to evolve and 
will change the way we gather and analyze 
spatial data. 


628 GIS Fundamentals 


GNSS 


Three trends will dominate GNSS inno- 
vation over е next decade: multi-constella- 
tion and mulisignal GNSS receivers, 
‘miniaturization, and system integration. 
Mulli-GNSS receivers will continue to take 
advantage of distinct satellite constellations. 
GNSS has been unable to provide 10 cm. 
(subfoot) postion accuracies in thick forests, 
deep valleys, or city centers. Multi-GNSS 
tracking systems exist now, supporting GPS. 
GLONASS, Galileo. and the Compass sys- 
tem have been developed, further increasing 
accuracy, availability, and reliability. Future, 
low-cost GNSS receivers will 

have hundreds of channels, and track tens of 
satellites even when under heavy forest can- 
opie, in canyons, and among tall buildings, 
bringing real-time precise positioning to 
everyone. 

Dual channel GNSS chips at costs less 
than $250 now provide centimeter accura- 
cies, and prices will continue to drop (Figure 
15-1). Earlier inexpensive GNSS receivers 
tracked a single frequency, e.g., LI in the 
U.S. GPS system, and provided accuracies 
in the few to 10s of meters range. Measure- 
ments in a second frequency allows reduc- 
tion of ionospheric and atmospheric transit 
errors, leaving smaller system and clock 
errors that are in the tens of centimeters to 
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centimeter range. New systems integrate 
both improved hardware with suitable pro- 
cessing. Systems still require substantial 
modification to become mm-key, easy-to- 
use, and robust, but within the next decade, 
centimeter-level accuracies will be available 
in real time, for tens of dollars in hardware 
cost 


GNSS receivers will cost less, shrink in 
size and weight, and increase accuracy for 
some time to come, and these improvements 
will spur even more widespread adoption of 
this technology (Figure 15-2). Microelec- 
tronic miniaturization is helping shape the 
GNSS market. As GNSS use grows and 
methods improve, single chip 
GNSS systems have emerged, and these 
chips are decreasing in size. GNSS chips 
smaller than a postage stamp are available, 
including some that may be inte grated into 
common electronic devices. Many vendors 
are well on a path to system integration, and 
it will become more common to embed the 
antenna, receiver, supporting electronics, 
power supply, and differential correction 
radio receivers in a single chip or small cir- 
cuit board. Some of these integrated systems 
are smaller than most GNSS antennas of a 
decade ago, and systems will continue to 
shrink. A button-sized GNSS is not far off 


As receivers shrink in size and cost, it 
becomes practical to collect positional infor- 
mation on smaller individual objects. While 
GNSS is unlikely to help you find your keys. 
small GNSS receivers will collect spatial 
data for smaller objects. For example, a few 
years ago it was uneconomical to track 
objects smaller than a cargo ship. Now 
trucks or containers are routinely followed. 
Inthe near future, it may be common to 
track individual packages, 

GNSS miniaturization means we will 
directly collect much more data in the field 
han in times past. City engineers may study 
traffic patterns by placing special-purpose 
GNSS receivers into autos. How long does 
the average commute take? How much of 
the time is spent sitting at stop signs or 
lights. and where is the congestion most 
prevalent? How is traffic affected by 
‘weather conditions? Analyses of traffic net- 
‘works will become substantially easier with 
small-unit GNSS, Disposable GNSS receiv- 
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ers may be pasted, decal-like, on wind- 
shields by the thousands, to transmit their 
data back toa trafic management center. 


Ubiquitous, inexpensive. or free differ- 
ential correction signals are substantially 
improving the accuracy commonly achieved 
with GNSS. Many U.S. states and national 
governments abroad will establish more. 
complete coverage. Virtual Reference Sta- 
tion (VRS) networks promise to allow sub- 
meter and even near-centimeter level 
positioning in real time. Commercial solu- 
tions will be further developed and made. 
less expensive. 

GNSS systems will add functions, 
including the ability to take photos or videos 
and attach them to geographic features in a 
database (Figure 15-3). The old adage “a 
picture is worth a thousand words" may be 
modified to "a picture saves a thousand 
hours.” These systems will greatly aid pl 
ning. management, and analysis by more. 
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easily providing images in GIS. For exam- 
ple, the type, relative location, and condition. 
of public utilities such as fire hydrants may 
be described with both photos and alphanu- 
‘meric data collected in a database. Ifa work 
order is required to repair a hydrant, а photo- 
graph may be taken in the field and tagged to 
the work order. This photograph may be 
inspected to verify the type of hydrant, to 
perhaps identify the tools needed for rep 
orto recognize which specific parts are 
required for maintenance. 


Fixed and Mobile Three-Dimen- 
slonal Mapping 


GNSS is also being combined with new 
advances in ground-based laser scanning to 
increase the scope, accuracy, and efficiency 
‘of spatial data collection. Three-dimensioaal 
Scanning devices have been developed that 
measure the horizontal and vertical location 


of features (Figure 15-4), This scanning is 
necessary because many features are modi- 
fied over time, for example, roads are 
changed. buildings are extended, extra sup- 
ports may be added to towers, or ой refiner- 
ies may be re-plumbed. Inventories must be 
updated to record the features as built, rather 
than as designed or observed during the pre- 
vious inventory,  three-dimensional scan- 
ning laser may be combined with a precise 
GNSS receiver to measure the X, Y, and Z 
coordinates of important features. The 
GNSS is used to determine the Location of 
the scamning laser. The horizontal and verti- 
cal offsets from the scan point are measured 
by the laser. These measurements are com- 
bined with coordinate geometry to calculate 
the precise positions for all features scanned 
in the field. 


The trend of multi-technology integra- 
tion will accelerate for rapid, centimeter- 
level mobile mapping. Multichannel GNSS 


Figure 154: Mapping зум combined with GNSS, tee dinesina imaging iners optical iaa 
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combined with other positioning systems 
will provide highly accurate locations, and 
three-dimensional laser scanners and 360 
degree image data collection will allow 
faster collection of X, ¥, and Z coordinates. 


Combined with GNSS systems and 
‘mounted on mobile platforms, three-dimen- 
sional laser mapping systems will collect 
highly accurate data accessible by anyone 
with a traffic-enabled GNSS. Approaching 
drivers may be forewarned, travel times cal- 
culated and new suggested routes identified. 
One can imagine self-driving automobiles 
that navigate via a combined GNSS/LIDAR/ 
GIS, using systems to avoid collisions via 
real-time distance measurements and wire 
less communications with "nearby" automo- 
biles 


Such mobile systems will help improve 
the currency and accuracy of digitized trans- 
portation networks, and anything visible 
from tized 
while driving, as well as every building 
light pole, sign, bench, tree, or any other 
three-dimensional structure visible from 
them. Efforts will move from a focus on the 
development of integrated, tum-key data 
collection systems to software and methods 
‘workflow, so that data may 
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travel from the device to the database with as 
litle human intervention as possible. 


Automobile systems would rely on mul- 
tiple technologies. A GNSS would locate the 
vehicle to within a few tens of centimeters in 
real time. Three-dimensional data on road 
centerline, edges, curbs, adjacent poles, and 
other important features would be identified 
for the trajectory ahead. When combined 
with an on-vehicle laser scanner, the system 
may identify moving automobiles and distin- 
guish them from unexpected stationary 
objects within the roadway, or other changes 
in conditions, A combination of LIDAR. 
RADAR. and cameras may help identify 
objects, and compare their location to 
expected features in on-board spatial data 
Automobiles may be reliably identified. 
Aids, such as virtual illumination on the 
‘windshield, may be used to highlight other 
vehicles or road edge when visibility is poor 
(Figure 15-5), flag upcoming hazards or 
tums, or wam of unexpected conditions. The 
autos could use mapped information on 
shoulder width, nearby off-ramps, the road 
ahead, or other structural information to exe- 
cute the appropriate driving maneuvers and 
avoid accidents 


Figure 18.5: LiDAR and other data may be seed and displayed in autos. improving diver safety and tip 
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Ground Based Positioning 


Ground-based positioning systems may 
find increased adoption. Although GNSS 
systems provide remarkable positioning 
information, they are limited. GNSS signals 
can't pass directly through most solid 
objects. Buildings, mountains, and dense 
forest canopies entirely or partially block 
GNSS signals, yielding a reduced set of 
Observations and reduced spatial accuracy, 
orat worst, oss of position. GNSS doesn’t 
‘work in many indoor locations. 

Signal strength is one limitation of 
GNSS systems. Satellite launch constraints 
force the use of relatively low-power trans- 
‘mitters, and transmission distances are quite 
large, further dissipating energy. Signals at 
the receiver are often weak, and difficult to 
distinguish from multi-path transmissions. 

Ground-based positioning services are 
under development that solve many of these 
problems. These rely on a set of distributed 
transmitters and precisely surveyed loca- 


tions, and are similar to GNSS in using the 
same basic principle of range measurements 
and triangulation (Figure 15-6). Each 
‘ground-based station transmits a coded sig- 
nal, which is decoded in a receiver to calcu- 
late a range, and then precise location. 
Centimeter level positioning is possible, and 
these ground-based systems may be used 
independently or in conjunction with GNSS 
positioning. Ground-based antennas transmit 
signals that are orders of magnitude stronger 
than GNSS (Figure 15-7). This greatly 
enhances reception in sub-canopy environ- 
‘ments. Since transmitters may be small, 
dense deployments across high buildings 
may effectively remove the urban canyon 
limitations. 


Separate systems lave been proposed. 
for indoor positioning. These fuse a number. 
of technologies, including Bluetooth and 
‘WiFi beacon measurements, and "dead reck- 
oning.” using a physical or electronic gyro- 
scope to measure distance and direction 
traveled from the last known location, and 


A 


Figure 15.6 A diner itning ground-based 
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systems, here used in conjunction with GNSS 
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Figure 18-7 A ground-based masamiter and 
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‘coordinate geometry as described in Chapter 
5 to update current position. Married with 
3D interior GIS data. they may support 
many convenience, energy, efficiency, and 
safety of life applications. 


These systems are expected to be more 
costly on a per-unit area coverage basis than 
satellite GNSS signals, which are essentially 
free to the end user. However, ground-based 
systems may still find application, particu- 
larly if robust, centimeter-level positioning 
is required for self-driving vehicles or auton- 
‘omous robotic navigation on streets or 
through buildings. 
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Datum Modernization 


New datum realizations will be released 
in the next few years for U.S. territories, 
‘based on improved measurements and a 
change in the basic model and methods for 
datums. Datums in the U.S. have been based 
‘on a non-Earth centered ellipsoid to maintain 
‘compatibility with previous systems and 
‘measurements. The disadvantages of this 
system now outweigh the advantages, and so 
а new official datum system, originally pro- 
posed for North America in 2022, but now 
likely for 2024 or later. 

Аз described in Chapter 3, the conti- 
nents are moving about the Earth on plates, 
sometimes at rates exceeding 2.5 cm (one 
inch) a year. Over several decades, this drift 
leads to changes in the relative positions 
among points on different plates (Figure 15- 
$). In addition, geodesists must factor the. 
total amount of movement into their devel- 
‘opment of datums, because measurements 
have been made over several decades, so the 
relative position of any location depends on 


Continuous crustal deformation model from ITRF2014 


both the time and location of the respective 
‘measurements. Further, the calculations 
require we establish a stationary reference 
frame against which to measure points. 
Because of differences in our starting point, 
and in how we account for crustal move- 
ment, there are large differences between the 
NADI3 family of datums used primarily in 
North America, and the ITRF datums used 
by most of the rest of the World. 

The ITRF is an Earth-centered system 
based on measurements of the X, Y, and Z 
locations and velocities of points, t places. 
the origin of the adopted ellipsoid at the best 
estimate of the center of mass of the Earth at 
the time of each adjustment. The post-1986 
NADS datums are similar in that they are 
centered on an Earth model. In contrast to 
the ITRF system, the NADS3 datums have 
not adopted the best measurements of the 
Earth's center, but rather a position compati- 
ble with older NADS3 datums. This center 
assigned a value relative to average crustal 


эку 


vectors ofthe Earth's surface, measured at TRE stations arose the globe. Note the 


‘lately high velocities ofthe enter Pacific Pate and varios дете Алтоон of overall pte rel, 
‘These movements mnt be factored ato ише damen reasons (wih pessoa Н Dreses. 


aevi org fleadnin doc Boketinev/2017 Drewes ACKIM based ов ITRF2014 IAG_Kobe pd). 


velocities on the North American tectonic 
plate, rather than а global network. There. 
‘were many good reasons for maintaining the 
old origin, primarily because И maintained 
coordinate compatibility with older NADS3 
datums, could be deployed rapidly, and was 
integrated with a relatively dense network of 
continuously operating stations within North 
America 


‘With the development of the ITRF, the 
active participation of the NGS, and the inte- 
gration of CORS stations into the ITRF net- 
work, there is strong impetus to harmonize 
datums in North America with international 
efforts. Improved global measurements will 
support more accurate horizontal and verti- 
cal datum development, and help support 
precise, rapid. centimeter-level positioning 
worldwide. 


As noted in Chapter 3, there is a plan to 
update the datums used in North America, 
with the introduction of the North American 
Terrestrial Reference Frame of 2022 
(NATRE2022). This will initally remove 
much of the positional difference between 
TTRF/WGSS4 and U.S. datums. It will entail 
a shift, often up to two meters (six feet) in 
NADSSQOIL) coordinates to NATRE2022 
coordinates, 

Current plans fix the NATRF2022 to 
the North American and related tectonic 
plates. This will make position shifts small 
due to continental drift, and so generally 
result in stable locations across time over. 
most of North America. This is different 
than the ITRF system, which generally holds 
the average shift across all tectonic plates to 
be zero, thereby distributing shifts across the 
globe. The ITRF and NATRF2022 positions 
will diverge over time. The datum transfor- 
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mation between these two systems will be 
produced, and time dependent. It will likely 
be in the form ofan initial X, Y, and Z trans- 
formation, and then a time-dependent piece 
їп which the time difference between input 
and output datums is accounted for, to incor- 
porate relative continental drift. 


‘The ten-year plan also describes the 
process for improving the vertical datum for 
North America. Improvements will be based 
‘on gravity measurements across the hemi- 
sphere, accounting for changes in gravity 
fields over time, and also for the rises in 
mean sea level. Much as with the horizontal 
datum. it will be based at least in part on an 
integrated, global set of satellite-based mea- 
surements, and tied to ITRF measurements. 
The new vertical datum will allow the calcu- 
lation of heights tied to a measurement 
epoch. 
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Improved Remote Sensing 


Spatial data collection will be substan- 
tially improved with the continuing 
advances in remote sensing. More satellites, 
higher spatial and temporal resolution, 
improved digital cameras, and new sensor 
platforms will all increase the array of avail- 
able data, We will be able to sense new phe- 
nomena, and locate previously measured 
features with increased precision and accu- 
racy. Satllite-based systems will continue to 
increase in resolution and coverage, in par- 
ticular the frequency of data collection. The 
Worldview system is a salient example of 
this trend (Figure 15-9). Three satellites 
have been launched, the latest with a 025 m. 
spatial resolution. This is beter than most 
midscale aerial photographs ofa decade ago. 


Similar improvements in resolution and cov- 
erage are in progress for other satelite image 
providers, increasing the frequency and 
types of images available for medium- to 
high-resolution mapping. 

Parallel improvements continue in aerial 
image acquisition. Aerial cameras increase 
in spatial resolution, meaning increasing 
availability of detailed images, with higher 
radiometric sensitivity. yielding a broader 
range of applications. Many systems have 
higher radiometric breadth, leading to rou- 
tine collection of more than the visible 
‘wavelength spectrums. National aerial 
acquisition programs are integrating these 
improvements. with NASS images com- 
monly provided at a one-meter resolution 


Figure 15-9: An image from the Worldview-3 satellite, taken over Ws 
od is won to be joined by others оГ 
realunion mages (courtesy Digtallobe) 
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image. up from the common two-meterreso- 
lution а few years past. The USGS and other 
organizations are providing subfoot resolu- 
tion images, perhaps nationwide. Individual 
light poles, curbs, and even parking lot 
cracks may be observed in these images, 
rendering them a rich source of spatial data. 

LiDAR data are another example of 
improved remote sensing. LIDAR systems 
are increasing in accuracy and resolution and 
declining in cost, with coordinate data com- 
monly paired with digital aerial images. 
Commercial LIDAR systems in the recent 
past collected a data point every few square 
meters, while current systems routinely col- 
lects several samples per square meter. Soon, 
tens or hundreds of samples per square meter. 
will be common, allowing unprecedented. 
spatial definition, 

LIDAR data are dropping in cost. and 
county to statewide LIDAR mapping will 
likely become common. Fusion of LIDAR 
data with other image and spatial data will 
continue. and surely create new opportuni- 
ties and applications 
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‘Three-dimensional GIS will be more 
widely developed and practiced due to data 
provided by three-dimensional, ground or 
near-ground level LIDAR (Figure 15-10) 
LiDAR systems carried on foot, on autos ог 
drones, or in high-flying aircraft will provide 
feature X, Y, and Z coordinates from various 
Vertical, oblique, and horizontal perspec- 
lives. When combined these allow truer 
thee dimensional characterization of 
space. Data development, management 
analysis, and visualization is currently tak- 
ing place across architecture, CAD, survey- 
ing, and GIS softwares and disciplines, and 
substantial development and fusing across 
these disciplines is in the offing. 

Advances in full-sized and miniature 
aircraft (Figure 15-11) are leading to 
increased availability of a broader range of 
aerial imagery. Pioneered primarily by 

(ASA decades ago. this technology is mak- 
4 the leap to broad commercial applica- 
tion. Some experimental UAVs may fly 
faster and turn tighter than many planes car- 
Trying human pilots. Specialized payloads 
may be carried cheaply on these crafts, for 


Figure 15-10: A photograph ( 
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Figure 18.11: UAVs fied with cameras and advanced GNSS may collect sub-entineter accuracy spatial 
Ait om LIDAR nd neil anger (cartes Geos nd edo) аа 


long periods of time, and in more dangerous 
conditions than in human-piloted aircraft. 
Small craft may be deployed quickly, and 
have demonstrated ultra-preise three- 
dimensional measurements at very high res- 
olutions and for site ог small, area-specific 
applications. 

There is currently intense development 
of small helicopters or airplanes outfitted 
‘with cameras, LIDAR. and GNSS position, 
control and telemetry electronics. This 
allows preprogrammed flight paths and on- 
‘board “intelligence” to modify steering in 
response to wind, rain, or other in flight con- 


ditions. UAVs have completed transoceanic 
flights with guidance only at takeoff and 
landing, and UAVs are routinely collecting 
‘weather, water quality, and other environ- 
mental data. 

There have been parallel efforts to 
reduce the size, weight, and improve the 
accuracy of LIDAR systems, as most current 
sensor systems are too large and heavy for 
most small UAVs. LIDAR system weight 
increases UAV size, limits collection time, 
and increases the hazards of collection, but 
there has been great progress in system min- 
iaturization, which should continue. 


Three-Dimensional GIS, Exterior 
and Interior 


Software is expanding to collect man- 
age, and analyze data that provide rich, 
three-dimensional representations. For most 
ofits history, GIS has most commonly 
treated a third dimension as an attribute, a 
value stored for a location, but did not sup- 
port an integrated three dimensional model 
‘Topology, overlay, and most analyses, par- 
ticularly in a vector model, require а planar 
geometry Three-dimensional analysis was 
often done via specialized, engineering or 
design tools such as CAD or visualization 
software 


Data development bottlenecks slowed 
the development of three-dimensional GIS, 
T has been impossible to quickly and accu- 
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rately measure all three dimensions. Drones, 
LiDAR. and high resolution cameras have 
filled this gap. combining both x-y-z coordi- 
nate measurements with centimeter or better 
resolution images. Systems are now avail 
able which allow both sophisticated mea- 
surement and rendering of three-dimensional 
structures (Figure 15-12). 


Remote sensing is also venturing 
indoors, enabling three-dimensional data 
collection of building interiors, These data 
are useful for maintenance, security, plan- 
ning, emergency evacuation, and damage. 
assessment. Backpack mounted data acquisi- 
tion systems collect positions via LIDAR 
and color and texture from multiple cameras 
(Figure 15-13). These data may be stitched 
together to provide full three-dimensional 


Figure 15-12 An example of extremely high-resolution LIDAR and image coordinate data, This is a combi- 
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Fiere 15:13: An example of indoor, three. 
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representations, and combined with exteriors 
tocreatea full coordinate and color record of 
building interiors and exteriors. 

Software vendors are developing data 
models and workflows to use these data. For 
example, ESRI has introduced an indoor 
mapping product that supports development, 
analysis, planning. and output in a 3D build- 
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ing model. These provide more than virtual 
walk-throughs, as useful as these are, to 
allow distance, area, and volume measure- 
ment, fixed and movable asset management, 
‘and engineering. environmental condition- 
ing. and remodeling calculations. 
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Cloud-Based GIS 


Cloud-based GIS provides data storage. 
analysis and display capabilities over the 
internet with services usually provided from 
‘one or multiple remote locations via the 
internet. Cloud-based computing includes. 
broadly accessible internet mapping applica- 
ions, but also includes data storage and a 
full suite of software-supported analytical 
capabilities, something that has generally 
followed a local or private network architec- 
ture, During the recent past, GIS software 
primarily resided on а local or closely net- 
worked hard disk, and ran on the central pro- 
cessing unit ofa focal computer. You 
downloaded data to a local hard drive, and 
purchased software to physically install on 
the local computer, although the software 
may have referenced licenses or other 
resources on a (usually) proprietary server 
‘on another computer. Cloud-based comput- 
ing envisions many ofthese 

vided from distant sources, 
computers perhaps serving only as a display 
and command entry portal — data, software, 
and processing may all be elsewhere. 


Cloud computing has many potential 
advantages. There may be a ower total cost 
‘of ownership, because you may only need to 
use a set of software occasionally, and can 

у as you need it, rather than a fixed price 
irrespective of total use. Economies of scale 
in data storage and maintenance or in com- 
puting power may be favorable, as well as 
the centralization of specialized technical 
support. Additional capacity may be added 
as needed, as market share grows, or specific 
project demands increase. Resources may be 
scaled up or back as needed. New capabili- 
ties may be rented or tested more easily, and 
software functions may better adopt а rental 
‘model and pricing structure. 

Cloud computing may also provide 
faster, broader, safer, and more continuous. 
data access, Interet connections are increas- 
ing in speed, and solid-state memory in large 
installations provides faster access yet. 
Large server facilities may be on continu- 


ously, always accessible Large server arrays 
may be outfitted with proper backup and 
protection, including data mirroring at dis- 
tinct locations. Mirroring provides data 
redundancy, because the same data are 
stored concurrently at different physical 
facilities, often miles or even countries арап. 
Ifa fire, flood, or other disaster befalls one 
data server, the mirror image is likely to 
remain intact. 


Internet mapping is perhaps the simplest. 
and most common form of cloud-based com- 
puting Many internet applications allow 
users to compose maps on a Web page. The 
individual user has some control over the. 
data layers shown, the extent of the mapped 
area, and the symbols used to render the 
‘map. The interne is different from other 
technologies because it allows а wide range 
of people to custom-produce maps. Each 
user may choose her own data and carto- 
graphic elements to display. The user is 
largely free from any data development 
chores and thus needs to know very little 
about data entry editing, or the particulars of 
map projections, coordinates ог other details 
required for the production of accurate spa- 
tial data. Typically, the map itself is the end 
product, and may be used for illustration, ог 
to support analyses that will be performed 
entirely within the user's head. 

These internet mapping applications are. 
particularly appropriate when a large num- 
ber of users need to access a limited number 
of data layers to compose maps. The internet 
users may select the themes, variables, and 

lization, in contrast to a static map 
graphic, in which a website cartographer 
defines the properties of each map. 


Because most intemet mapping applica- 
tions are built for users who have litle 
knowledge of spatial data, maps, and analy- 
sis, the suite of spatial operations allowed is 
usually very sparse. Most internet mapping 
is currently limited to creating simple dis- 
plays. This is changing, as query, distance 


functions, and basic tools are provided, 
albeit in very simple forms. 


As noted in Chapter 7, web mapping 
services are another step toward a cloud- 
based GIS model, Data are stored some- 
where "on the cloud,” and a specific link to 
them provided. Data are accessible after 
forging a connection as if they were from 
any other disk source, at least as far as the 
accessing software view. The GIS program 
doesn’t distinguish between local and cloud 
data once the connection is made. 

‘This brings up one major limitation of 
the cloud model, its dependence on a fast 
connection to the cloud of resources. Slow 
internet speeds become a hindrance, particu- 
larly with large data or image sets that char- 
acterize many spatial analyses, Each zoom, 
pan, or layer addition may require a scene to 
be repainted, involving the movement of bil- 
lions of pixels through the web connection. 
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While this may be overcome to some extent 
with local caching, anticipatory download- 
ing. for example, pulling data in a wider area 
than that immediately viewed, and other 
software techniques, but within limits, In 
many instances there is no substitute for 
‘extremely fast internet speeds. Widespread, 
fast internet should be forthcoming, but 
access may depend on internet demand as 
well as supply. 


To date, analytical tools delivered over. 
the interet are still quite rudimentary, and 
may likely remain so for some time. Robust, 
‘correct operation of an analytical tool is dif- 
ficult to provide in many instances, and 
requires a sizeable investment. Systems for 
delivery, payment, and protection for a 
broad, interacting suite of geospatial tools 
will take development, both technically and 
culturally. 


Open GIS 
Open Standards for GIS 


‘Open standards in computing seek to 
reduce barriers to sharing programs, data, 
and information. Spatial data structures may 
be very complex. perhaps more than many 
other kinds of data. Data may be raster or 
vector, real or binary, or represent point, line, 
or area features. In addition, different soft- 
ware vendors may elect to store their raster 
imagery using different formats, and data. 
may be delivered on different physical 
media. or formatted different ways. If a per- 
son orders an image in one format, but her 
computing system does not support the 
physical media on which the data are writ- 
ten, or does not understand the file structures 
used о store the image, then she may not be 
able to use these data. Incompatible systems 
are generally described as non-interoperable, 
and open standards seek to remove this non- 
interoperability 

The development of open standards in 
computing is driven by the notion that the 


larger user community benefits when there 
are no technical barriers that inhibit the free 
‘exchange of data and methods. Open stan- 
dards seek to establish a common framework 
for representing. manipulating. and sharing 
data, Open standards also seek to provide 
methods for vendors and users to certify 
‘compliance with the standard. Standards 
have been developed in a number of endeav- 
‘ors: for example, the ISO 9600 specifica- 
tions for physical storage formats allow any 
manufacturer, data developer, or user to 
build. read, write, or share data on hard 
drives, optical disks, tapes, or other storage 
devices 

Businesses and many other organiza- 
tions by their nature have a proprietary inter- 
est in the spatial data entry, storage, and 
methods they produce. Many vendors sur- 
‘vive by the revenue their GIS products gen- 
erate, and so have а strong interest in 
protecting their investments and intellectual 
property. However, the developers also may 
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spur adoption of their GIS packages and 
speed up the development of complementary 
software by making the internal workings of 
‘some portions of their GIS packages public 
knowledge, for example, by publishing the 
data structures and formats used to store 
"heir spatial data. Thus, these vendors also 
have a strong interest in making parts of 
‘their system open to the public. 

Open standards for spatial data are the 
responsibility of the Open Geospatial Con- 
sortium. The Open Geospatial Consortium 
аз developed a framework to ensure 
interoperability. They do this by defining a 
‘general, common set of base data models, 
types, domains, and structures, a set of ser- 
‘ices needed to share spatial data, and speci- 
fications to ease translation among different 
representations that are compatible with the 
‘Open Geospatial standards. Data developed 
by a civil engineer and stored in a raster for- 
mat on a Unix version of Arc/Iafo should be 
readily accessible toa soil 
GRASS on an OS- 


Open standards in GIS ace relatively 
new, While most of the large software ven- 
dors, dta developers, and government and 
educational organizations are members of 
the Open Geospatial Consortium. some. 
‘components of the standard аге still under 
development. In the Пише, there will be 
increased emphasis on compliance to the. 
‘Open Geospatial standards. 


Open Source GIS 


Open source software is different from 
most other software in that it is distributed 
free, along with the source code. The open 
source organization (www.opensource org) 
requires that the software is not by design 
restricted to a specific operating system ог 
other technology, that there can be no royal- 
ties, and that there be no explicit discrimina- 
tion against fields of endeavor, persons, or 
groups. But the main, defining characteristic 
of open source software is an open, grass- 
roots network of collaborators developing. 
documenting, and freely sharing source. 
code. 


There are open source software of many 
types. from operating systems to word pro- 
cessors, and including GIS. Open source 
GIS software projects are directed at a range 
of applications, and notable examples 
include the development of general-purpose 
GIS (eg. GRASS, FMaps) to specific utili- 
ties (eg. MapServer for Web-based spatial 
data display. query. and analysis) or toolkits 
to support GIS software development (е... 
GDAL, shapelib. 


‘Open source use is a large and growing 
phenomenon for many reasons. High soft- 
‘ware costs are driving many organizations 
toward open source software. Licenses for 
some commercial products are tens to hun- 
dreds of thousands of dollars annually for 
some large organizations. If these organiza- 
tions employ staff programmers. open 
source GIS may meet geoprocessing needs 
ata reduced cost. s 

‘Combined GIS and analysis has grown 
particularly in the open source statistical 
package R. A broad range of spatial data for- 
mats are supported, including shapefiles, 
geopackages, geodatabases, common raster 
formats, government and ISO standard inter- 
change formats, and standard image types, 
зо that there are low barriers to data transfer. 
Packages for direct access to general and 
specific spatial databases are freely 
able. Given the thousands of R developers 
and open code format, almost every analysis 
‘method available in proprietary packages are 
also replicated in R packages, plus thou- 
sands more. While R exhibits many disad- 
vantages relative to commercial GIS 
software in production or large-organization 
settings, due to its lack of integration, steep 
learning curve, often relatively slow execu- 
tion times, and complexity, it often provides 
many advantages in a broad range of specific 
analyses. 

Many organizations use open source 
GIS because commercial products may not 
provide the required functions or capabili- 
ties. Three-dimensional structural analysis. 
tools may exist that meet the requirements of 
a mining company, and so they may develop 
specific applications. This development may 


be more efficient and less expensive in an 
open source environment 

Open source use is expanding in many 
countries because of specific 
initiatives. China, India, and Brazil have all 
Supported open source software in general, 
and operating systems in particular, to main- 
tain independence from foreign firms, 
reduce costs to government and local busi- 
ness, and develop local information technol- 
огу expertise. Because these nations are 
home to more than a third of the world's 
population. their actions alone are substan- 
tially increasing the use of open source GIS. 


A Hybrid Model 


Proprietary softvare vendors may adopt. 
a hybrid software approach, where they 
interact with open software and systems. 
This has taken many guises. Some may sim- 
ply support standards, and ensure their sys- 
tems may access and generate industry. 
standard data forms, But a fuller approach 
provides the code in a mix of open and pro- 
prietary parts. Base code may be provided 
free, with а charge for extensions or some 
set of additional capabilities. 

Alternately, there may be charges for the 
base code, but enough source code or adher- 
ence to open standards that open source 
extensions can be easily added later. This 
allows for the development of an "ecosys- 
tem” of extension around а base application, 
both proprietary and open source. 


Summary 


GIS are a dynamic collection of concep- 
tual models, tools, and methods that use spa- 
tial data. As such, they will continue to 
evolve. What becomes standard practice in 
the future may be quite different from the 
methods we apply today. However, the fun- 
damental set of knowledge will remain 
unchanged. We will still gather spatial and 
attribute data, adopt a spatial data model to 
conceptualize real world entities, and use. 
map coordinates to define positions in space. 
The coordinates are likely to remain based 
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‘ona standard set of map projections, and we 
will combine the spatial data of various 
classes of entities to solve spatial problems. 
This book is an attempt to provide а founda- 
tion to effectively use spatial analysis tools. I 
hope it has provided enough information to 
get you started, and has sparked your interest 
in learning more. 
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Suggested Reading 


‘The World Wide Web is the best source for information about new developments and 
send иин das scq ae кы oop. Deco ی‎ in 
ters, nearly all the suggested readings are websites, We apologize та) 
Vie vod, tort equ b ОМО ат: fx and allied des E 
the most current information. 

www. gislounge com 

‘wwwanasa.gov, general NASA entry point 

www gis.com, an ESRI-sponsored website, general information 

‘wwwatsgs.gov, public domain data from the USGS 


wor uses gov/centers/eros, another common USGS envy point for image and GIS 
ia 


www.epa.gov/geospatial/ 

‘www opengeospatial org, open GIS consortium 
hips:/catalog.data.gov 

‘www gpsworld.com/ 

‘www digitalglobe.com, high-resolution satellite data 
‘www gisuser.com 

www directionsmag com 


wwwarblox.conven/blogsinsightsfive-key-trends-gps 
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‘Study Questions 


154 - Which of the described new technologies is likely to have the largest impact in 
GIS over the next five years? Why? 


152 - What are areas of spatial data entry, analysis, output, or storage that are in dire 
need of innovation or new and better methods? What is a major bottleneck to fur- 
ther advancement of spatial information science and technology? 


153 - What is Open Source GIS? How will this change spatial computing? 
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Appendix A: Glossary 
Terms used in GIS and Spatial Data Development and Analysis 
Accuracy: The nearness of an observation or estimate to the tre value. 


Active remote sensing system: A system that both emits energy and records the energy etumed by 
эн objects 


Adaptive sampling: A method о increase sampling efficiency by increasing the spatial sample fre- 
‘quency in areas with higher spatial variability. 


Adjacency: Two area objects that share a bounding line are topologically adjacent. 


Affine coordinate transformation: A set of linear equations used to transform fiom one Cartesian 
‘coordinate system to another. The transformation applies а scaling. translation, and rotation 


Almanac: Important system information sent by each GPS satellite, and recorded by a GPS. 
‘receiver to obtain current satellite health, constellation status, and other information helpful 
for GPS positioning. 


cation server: a middle tier а a common database architecture. that passes requests for d 
e over der ovr lev 


Are: А line, usually defined by a sequence of coordinate points. 


ArcGIS: AGIS software package produced by Environmental Research Systems, ne. of Red- 


Ae tentare: A polygon, collection of contiguous raster cells. or other representation of a bounded 
‘wea The feature is characterized by set of ишиме: and Bas an aide and an оне 


grr nerag ai porem teas из tonem and 1o ed 
symbol зей in information storage and processing. Number are between 0 and 25 
may be represented by a single byte of data. 


Aspect: The direction of steepest descent on a terrain surface. 


Atmospheric delay: A change in the speed of light, and more specifically GNSS signal speed, 
‘when passing through the atmosphere. 


Atmospheric distortion: Image displacement due to the bending of light as it passes through the. 
"atmosphere. 


Attribute: Non-spatial data associated with a spatial feature. Crop type, value, address, or other 
information describing the characteristics of a spatial feature are recorded by the attributes 


Autocad Geospatial: A suite of GIS software systems produced by Autodesk, Inc. of San Rafael, 
‘California, 


‘WENN oc Озера et enar iet УА emm d 
дош! Ox pred tror cct sie thermal 
Dred stelle tages of tbe gk each dy Toe ym had wp t0 а Затта, 
dis бе elis шейле vi day gota coverage, teal far ndis mapping. 


Bandwidth: A parameter used in kernel mapping to affect the influence, or spread, of each point 
"observation. 


Base station: GNSS recording station over a precisely surveyed location. used in differential cor- 
‘ection, 
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Beacon receiver: A GNSS receiver capable of decoding beacon base station signals transmitted by 
‘the U.S. Coast Guard beacon stations. 

Bearing: A direction, usual specified as a geographic angle measured from some base line, e .. tme 


Верон: А Chines GNSS systems 


Bench mark: A monumented, precisely surveyed location for which coordinates are known to a 
high degree of accuracy. 


Bilinear interpolation: A method for calculating vale for a gid location based on a lincar combi- 
nation of nearby grid values. 


Binary claieaton: A classification of spatial objet into two classes, typically denoted by a 
aic class. 


Binary number: a number presented in numbering system with two possible digit, 0 and 1, with 
successive columns in powers of two, e.g. 1 represents 1, 10 represents 2, 11 represents 3, 
100 represents 4, ete. 

Binary operation: A spatial operation with two inputs. 


BLOB: a Binary L arge Object generally a сов грош! digital objector filed stored in or linked to in 
spatial database, e.g., and image or video. 
bingy digit АЫ bas oae of eo t oa oz oft ur rom Ths i the жаен ait of 
jal information storage and the basic building block from which all other computer data 
та Д 
Boolean algebra: Conditions used to select features with set algebraic conditions, including and. of, 
“and not conditions 


Boundary generalization: Simplification ofthe "voe boundary lines that define (ниге due to 
тамам кану ир berti sed tarii be ирани Зе он et aly рине 
tanec pers e doe : 

Terveen he endpoints 5 


Butler A buffer area is a polygon or collection of cells hat are within specified proximities of a set 
“ышы A bulter operation ione that creates buler areas 


Bundle adjustment: The іаво removal of eometie dition and production of orlopho- 
райы fom кашг of aera mage. äi 


Byte: A unit of computer поши coasting of 8 binary digit. Each bin digit may bald a эко or 
i: one Абук may sore opt 256 dent vale. Jis 
CIA code GES: Course sequi code, ОЁЗ signal wed far rapid, relatively lw accuracy poi 


tional estimates. Accuracies without further corrections are typically from a few to tens of 
meters 


m TETTA 

ign ties то produce тко 
Related to GIS in that coordinate information ts input. manipulated, and output. These зуз- 
{ems often do not tore map projected coordinates, and do not bave sophisticated tribute 
шу and manipulation capabilities 

‘Cadastral: With reference to property lines or ownership. for example, a cadastral layer usually 
‘contains propery lines and a cadastral survey ıs the survey af property lines. 


‘Candidate key: A column or columns in a relational table that meets the requirements for a key, pri- 
‘marily that it uniquely identifies every row in the table. 


Corrier phase GPS: Relatively slow but accurate signal used to estimate position Position may be 
‘determined to within few centimeters or better 
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Cartesian coordinate system: A right-angle two or three-dimensional coordinate system. Axes 
intersect at 90 degrees, and the interval along each axis is linear 


Cartographic modeling: The combination of spatial data layers through the application of spatial 
‘operations. 


Cartographie object: A digital representation of a real-world entity. 


сепини map: A map produced such dut he relative positions af objects depicted are spatial 
accurate, within бе lames of ibe technology andthe map projection used. ian 


Cell dimension: The edge length of square cells used in raster data sets. 


Centrold: A central point location for an area feature, often defined as the point with the lowest 
average distance to all points that define the area boundary. 
Characteristic hull: A polygon boundary that attempts to include the densest concentration of a set 


ef observed poina Olea мий in komne rangs analys И ату wih Trangultd Ls 
Stover Sof points qd wavs pe geo ome ud econ ree 


CChormpleth map: A map of polygons with color assigned а a gradien that depicts classified levels 
Ve vei. Thee sed for population deny, average income, beak isk, 
ober variables mapped by абонати бозам, ia wich igh o low cigare 
Sites 

Clinton: A categorization of spatial objects based on their properties 

Свен: Programs a request диз bomm a sarvar. 

сар (overlay): The verial combination of two data layers, with a c dei 

(rt cian recen m locam of expe t c at p Sly idea om e 
o2 clip layer or the clip area. 


Cloud computing: Utilizing computer processing that is remotely located. typically on а networked 
tiny of deant computer? эсе through toe WA Wide Web. 


Cluster sampling: A technique of grouping samples, to reduce travel time among samples while 
maintaining sample number. 


Code phase GPS: see C/A code. 


COGO: Coordinate Geometry. the entry of spatial data via coordinate pairs, usually obtained from 
field surveying instruments. 


COMPASS: Chinese satellite-based positioning systems. 


Concatenated key: The use of two or more table columns as a key 
ment system. 


Conformal coordinate transformation: A registration that requires scale changes to be equal in 
the x and y directions. 


Conforme projection: A map projection is conformal whea it preserves shape for some portions of 
map 


Conie projection: A map projection that uses а cone as the developable surface. 


Connectivity: А record or representation of the connectedness of linear features. Two linear fea- 
Tures or networks are connected if they may be traversed without leaving the network. 


Continuous surface: A variable or phenomenon that changes gradually through two-dimensional 
space, e... elevation or temperature. 


Contour line: A line of constant valve for a mapped variable. 


ıa relational database manage- 
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Control points: Point locations for which map projection and database coordinate pairs are known 
To a high degree of accuracy. Control pointe are most often used to conver digitized coondi- 
nates to standard map projection coordinates. 


‘Convergent circle A circle used in defining a facet for a triangulated irregular network, that passes 
through three points, and does not contain any other points. 


‘Convex hull: The polygon that completely contains a set of points and that has no acute (< 180°) 
‘exterior angles 

Coordinates: pair or triplet of numbers used to define a position in space. 

Coordinate transformation: The conversion or assignment of coordinates from a non-projected 
coordinate system to a coordinate system, typically via а system of linear mathematical equa- 
ions 

‘Core aren: The central or primary concentration for a set of points. 

Сом surface: A spatial depiction of the cost of traveling among locati 


Cubic convolution: А method of calculating grid values based on a weighted combination of 16 
nearby grid cells. 


Cylindrical projection: A map projection that uses a cylinder as the developable surface. 
Dangle: An unintended overshoot in а line segment when crossing another line segment. 


Deta independence: The ability o make changes in ив ctu in a database management 
Ten hat ae transparent o шегу or applications that ше daa. = 


Data model: А method of representing spatial and aspatial components of real-world entities on а 
‘computer 


Database management system (DBMS) A collection of software tools for the entry, organization, 
‘storage, and output of data. 


Data Pane: Area of a map that contains graphic depictions of an area, usually the largest portion of 
апар. 


Datum: A set of coordinate cations specifying horizontal positions (ora horizontal danum) o ver- 
dicus ore veta ама) oat ah майка. 


Datum adjustment: A re-calculation of a datum based on additional measurements 


Datum realization: The outcome of a datum re-adjustment, a specific, defined datum surface and 
set of datum points. 


Datum shift: The change in horizontal or vertical point location that results from a datum adjust- 
meat. 


Datum transformation: A method or set of equations that allows the calculation of a point location. 
in a one datum based on coordinates expressed in a different datum. 


Declination: The angle between the bearing towards True North and the bearing towards Magnetic 


Delaunay triangles: The set of triangles formed in a triangulated irregular network, connecting 
points to the nearest points to create triangles while ensuring that the triangle edges don't 
‘ross, and are formed by convergent circles. 


DEM: Digital Elevation Model а raster set of elevations, sally spaced in a uniform horizontal 
E 
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Developable surface: A geomet shape ошо which he Earth sphere is cast during а map proe 
Hog net ios чоное ino oh vi D i pay 
Digital terrain model A digital representation of elevation including ПЕМ, TINS, and other digi- 
"al representations. 
Diaphragm: A camera с that functions like the iris of the human eye, to control the 
amount of ight avaible to fll on фе бип or CCD recording surface, and о improve focus. 


Differential GNSS: GNSS positioning based on two receivers, one ata know location and one ata 
‘roving. unknown location. Data from roving receivers are corrected by the difference error 
computed at the known location. 

Digital map: an electronic, graphic depiction of an area. 


Digg iting: To conve paper or oter hardcopy maps to computer compatible and sored 
а 


Digitizing table: A device with а flat surface and input pointer used to digitize hardcopy maps. 
Dilution of precision (DOP): See position dilution of precision (PDOP). 
Dissolve: An operation that removes ines separating adjacent polygons that are considered qul 


sed on some characterise or meae А dissolve operation typically appbed based ой 
(seal aloes valable биг wre comedia a wl sod a teaser 


Domain: The range of values а variable may take. 


DORIS: Doppler obi radioporiioniog integrated on satelite, precise measurement of a 
Sere poison de Ear vis Dapper BIS m lie ipa. е Р 


Dot density maps: with dots placed inside of polygons in | to numeric value of a 
iie hn wel rest pol ces ones ose uma ta 
Sentra fone rts be ne righ veh ofc uals pi 
Spe мыш 


DRG: Digital Raster Graphies, a digital version of USGS fine- to medium-scale maps 


Duab-fequeney GPS receiver: A receiver capable of measuring the LI and L2 broadcast signals. 
Saute toenail acce and precise pons, rali eee kv- 

Dynamic height: A posts geopotential number, divided by a nonmalzing constant It is шей pri- 
maniy when measuring water levels or hydraulic beads 

Ending: The axis approximately parallel to lines of equal latitude ia UTM and a number of other 
Standard map projections. 

Electromagnetic spectrum: А ange of energy wavelength, from X-ray rough radar wave- Ө 
могу ков Eich олер жамгыр dom Qe УШЫ wo б tiorma ied 
ngia. 

Ellipsoid: A mathematical model of the shape of the Earth that is approximately the shape of a fat- 
ened sphere. formed by eating an ellipse. 


Eillpsoldal height: Height measured fom an ellipsoidal surface to a point on the surface of the 


Endlap: The end-to-end overlap in aerial photographs taken in the sume flight line. 
Entity: A real world item or phenomenon that is represented in a GIS system or database. 


654 GIS Fundamentals 


Ephemeris: Information on GNSS satellite orbits, required by GNSS receivers to computer satellite 
‘postion, range distance, and receiver position. 


Epoch: A specific date to which GNSS data are normalized, accounting for drift in coordinates 
through time duc to tectonic movement. 


Epson band: A band surrounding a linear feature that describes the positional eror relative to the 
Equal-Area classification: А classification method that assigns classes such that each elass corre- 
ponds to an equal area. 


Equal interval classification: А classification method that assigns an equally spaced set of classes 
across the range ofa variable. 


Equipotentiai vafa. As surface along which the gravitational potential equal paps best 
"understood as where the pull of gravity is equal across the 


Erase function: A vector spatial operation, typically that "clips out" and discards the area in one 
layer corresponding to the polygon boundaries in another data layer 

ERDAS: A GIS and remote sensing image processing software package owned and developed by 
Leica Geosystems, St Gallen. Switzerland. 


ETM: Enhanced Thematic Mapper a сава carried on bord Landsat 7. providing image dat 
‘with resolutions of 30 meters for visible through mid infrared, 15 meter panchromatic, and 
6O meter for thermal wavelengths 


Facet: A triangular face in a TIN. 


Fae A number added to coordinates in a map projection. usually to avoid negative coor- 
а азери е i a 5 ai 


Feature: An object or phenomenon in the landscape. A digital representation of the feature is fen 
ا‎ eae. vem 


Feature generalization: The incomplete representation of shape defining coordinates for entities 
Tepresented in a GIS. 


Fiduclal marks: Also known as fiducial, precisely scribed marks that are recorded near the edges 
‘of aerial images, and used to remove systematic camera distortion and o register images 


FIPS: Federal Information Processing Standards code - a set of numbers for defined political ог 
physical кае, in the United States. There are FIPS codes бг each ste, county, and 
anres. 


First Normal Form: A set of requirements for a relational database table, primarily that there be no 

column, defined as columns that represent the same kind of information. For example, 

ase tab шерде to present aie, герен cols for children Would vehe 
Tequeement for frst поста 


Friction surface: А raster surface used in calculating variable travel costs through an area. The fric- 
tion surface represents the cost per unit distance to travel through a cell. 


Flatbed scanner: An electronic device used о record a digital image of a hardcopy map or image. 


Flow direction: The direction water will low from a point, usually an azimuth or bearing angle 
assigned to a raster cell 


Foreign key: A column in a relational databace table that is a candidate key, and used to join the 
able toa different relational table. 


Friction surface: See cost surface. 
FTP: File Transfer Protocol а standard method to transfer files across a computer network. 
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Functional dependency: Property ast imei a database able. If one jem is inctionally 
corresponding value of the second item. 
Galileo: A European-based GNSS system. 


Generalization: The simplification of shape or postion that inevitably occurs when features are 


Geocentrie: A measurement system that uses the center of the Earth as the origin. 


Geocoding: The process of assigning a geographic or projection coordinate to a data item that is 
‘based on a street address, town, and state or county. 

Geodetic datum: A reference system against which horizontal and or vertical positions are defined. 
l ypieally consists of a sphere or ellipsoid and a set of poiat locations precisely defined with 
reference to that surface. 

Geodesy: The science of measuring the shape of the Earth and locations on or in the Earth. 


Geoid: A measurement-based model of the shape of the Earth. The geoid is a gravitational equipo- 
tential surface, meaning а standard surface of equal gravitational pull. The geoid is used pri- 
marily эз а bass for specifying terrain or other heights. 

Geokdal height: The distance measured normal to the ellipsoid surface from and ellipsoid to a 
ps 


forth: The northem axis of rotation of the Earth. Also known as True North, and by 
inition lies at 90° north latitude 


GeoMedia: A GIS software package produced by Intergraph, Inc., of Huntsville, Alabama. 

Geopotentia: The gravitational potential generated by the mass of the Earth defined as the work to 
‘move a unit mass object from infinity to that point. It is by convention a negative number that 
decreases (becomes more negative) as on grows closer to the mass center of the Earth. 


aisia information system. A GIS а based system to aid in the collection. 
=ч ie cuu por ete Ms d Ram 


GLM: Geographie Markup Language, а standard method for documenting and transferring spatial 
^ 


GLONASS: Global Navigation Satellite System. A Russian developed and maintained system for 
"coordinate measurement and positioning 


Global operation: A spatial operation where the output location. area, or extent comes from opera- 
tions on the entire input area or extent. 


GNSS: Global Navigation Satellite System. A constellation of satellites phus a ground control seg- 
ment ta ions pei catia aa at DONE бе Ear Тыз море, GPS, GLONASS od 
Other satelite navigation system. 


Gnomonie projection: A map projection with the projection center placed at the center ofthe 
GRASS: An open-source GIS software system. 


Graticule Lines of latitude and Longitude drawn on a hardcopy map or represented ina digital data- 
base, 


Gravimeter: An instrument for measuring the strength of the gravitational field. 


Great circle: A circle on the surface of the globe that splits the globe into two equal parts, and is on 
a plane that passes through the center of the globe 
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Great circle distance: The shortest distance between two points oa the surface ofthe Earth. This 
‘sane follows a great circle out, defined as the route onthe surface defined by a plane 
‘hat intersects the starting and ending pott andthe center of the Earth 

Grid, map: A network of contu X- nd Y-coodinis са а map. 

Grid North: The direction parallel to the northing axis in a projected, Cartesian coordinate system- 

Greenwich meridian: The n of eal ойде passing through the Royal Observatory i Green: 
‘ch England Тыл wrens eb coment en ones mar. 


 GRSS0: Geodetic Reference Surface of 1980, an ellipsoid used for map projections in much of 
‘North America. 


Hardcopy map: A map printed on physical media, usually paper 
Height above ellipsoid, HAE: See ellipsoidal height. 
Helmert transformation: A method to wansform among borizooal datums 


Hierarchical data modet: A method of organizing tribute dta that structures vales in a wee, 
lly from general to more specific. = = 


High Accuracy Reference Network (HARN): A set of state-specific horizontal datums realized in 
the early years of GPS measurements 


Н рез» ter: A raster operation that identifies large or high-frequency differences between 


Hydrography: Geographic representation of water features. 
Hyprography: Geographic representation of height features. 


талы: A GIS from Clark University widely adopted with particular strengths in analysis for devel- 
Shen undue snes md pnl ad cina T 


Ikonos: A high resolution imaging satellite system. Ikonos provides -meter panchromatic and 3- 
meter multispectral image data. 


Inner join: A combination of two data tables in a database management system based on a key col- 
umn The output table combines rows by matching values in the key column. and saves only 
rows that have matching key valves in both tables 


International Terrestrial Reference Frame (ITRE) A geocentric coordinate reference frame that 
follows an international standard for specifying Earth coordinates. Defines an origin, elip- 
soidal shape, and X, Y, and Z coordinate directions. 


International Terrestrial Reference Service ITRS): A system of people, institutions, protocols, 
hardware, and software foc organizing and taking the measurements needed for calculating 
1 realizations 


оперен delay: The change u travel tume of an electromagnetic signal when passing through 
the ionosphere, Mostapplicable а GIS to uncertainty ш GNSS роо тш de Yo wean 
signal travel times. 
Instantaneous field of view (IFOV): The area or angle sensed by an imaging system, or sensing 
‘component such as the pixel or lens, of the system- 
Interpolation: The estimation of variables at unsampled locations from measurements at sampled 
locations. Interpolation methods are usually understood to use a formula with all 


that are pre-determined, meaning that parameter values used in the formula do not depend on 
the data values. 
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Intersection (overlay): The vertical combination of o data layers, typically restricted to the 
‘extent of one data layer but preserving the data contained in both data layers for that extent. 


їмегуайгайо scale: A measurement scale that records both order and absolute difference in value 
Tora set of variables 


Isopleth map: A map depicting lines of constant value for a variable, also known as a contour map. 

Items; Variables or attributes in a data able, typically viewed as the columns of the table, These are 

the types of essential characteristics used to described each feature ш the geographic data se, 
area, depth, and water quality fora lakes dataset 


ris: A GIS system developed by бе Graduate Schoo of Geography of Clark University, Worces- 
tes, 


IDW: Inverse Distance Weighted interpolation, a method of estimating values at unsampled loca- 
tions based on the value and distance to sampled locations. 


Infrared image: An image that records reflectance in the near infrared wavelengths, typically 
‘including 0.7 to 1-1 micrometers. 


ЭРЕС: An image compression format. 
Kernet: An arrangement of cells and values used as а multiplication template in raster analysis. 


Kernel mapping: A method of identifying core areas, concentrations, or density of occupation. 
based on "stacking" kemels that represent occurrence frequency at observed locations. 


‘Key: An item or variable in a relational table used to uniquely identify each row in the table. 
Кей An interpolation method based on geostatistics. the measurement of spatial autocorrela- 
[^ 


Lambert conforma con A common, eoe bue map potion 


Law of sines: A trigonometric relationship that allows the calculation of unknown triangle edge 
‘lengths rom known angles and edge lengths 


Leveling surveys: Surveys used to measure the relative height difference between sets of points. 


Lidar: Laser detecting and ranging. the use of pulse laser measurements to identify the height, 
depth, or other properties of features. 


Line smoothing: adding vertices and changing the shape of a line, usually during digitizing, to 
remove discrete angles and create a bending are 


Line snapping: Automatic movement of digitized points to a nearby line when the point location is 
‘within a specified snap distance. 


Line thinning: Removing line vertices to reduce file size while maintaining most feature shape 
information. 


Linear referencing: See geocoding. 


LIS: A Land Information System, a name originally applied for GIS systems specifically developed 
for property ownership and boundary records management. 


Local operation: А spatial operation where the output location, area, or extent comes from opera- 
ions оп that same extent. 


Logical model: А conceptual view of the objects we portray in a GIS. 
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Longitude: Spherical coordinates of Earth location that vary in an east-west direction. 


Magnetic North: The point in the northera hemisphere that is the pole ofthe Earth's magnetic field, 
eate iid tour ol mapet сарыш o 


Manifold GIS software package produced by CDA Intemational. of Сизов City, Nevada. 
Мар algebra: The combination of spatial data layers using simple to complex spatial operations 
Map generalization: The of real features oa a map or in a spatial database, where 
ibe physical shape ог locaton doe aot completely represen! all the etal ofthe fetes. 


Map projection: A улаша rendering of features fron a spheroid or ellipsoid representing the 3- 
dimensional Earth to a map surface. 


Map scale: The ratio of a distance oa a map to the corresponding distance on the Earth. 

Maplnfo: GIS software package produced by Maplnfo, Inc., of Troy, New York. 

Mean center: A measure of the central location of a set of objects ог observations, based on the 
mean x and y coordinates for all observations. 

Mean chile: An estimate of a core are via а circle centered on the mean center, with a radius 
derived from the observed points, for example, the standard deviation. 

Meridian: A line of constant longitude 


Magnetic North: The point where the northern lines of magnetic attraction enter the Earth. Ма 
netic North does not occur at the same point as "True" or Geographic North. In the absence. 
of local interference a compass needle points towards magnetic north. The magnetic north 
pole is currently located in northern Canada 


Мини: Data about data, dat debes the properies of pia daa vet including the coordi- 
ae system. ement, wr Res origin. Laage. character 
ines eeded for effective eration and une (da. з” 


Metes and bounds survey: A survey method based on distance and sometimes angle measurements 
Tom known or mobumented poa 

Minimum distance digitizing; digitizing in stream mode in which points les than а minimum dis- 
tance apart are ot allowed. 

Minimum mapping unit (MU): The smallest area resolved when interpreting an aerial or satel- 
T magnet E mes ener om eese die i 

Moving window: A usualy rectangular rangement of els that shifts in position across a raster 
"fae A sach positon as operation зра wig the cl aus comely cocoate 
by the moving widow: 

MSS: Muhi-spectral Scanner, an earty satelite imaging scanner carried by Landsat satelles 


Modifiable areal unit problem: The dependence of aggregate area statistics on the size and shape 
of the aggregation unit 


M nes iy a gc i i 
seca ae ee аааиа 
a о а 


Molodensk transformation: A method to transform among geodetic datums. 
Monte Carlo simulation: A method of estimating the variability in a process or model by adding 


lucent the gu daa or parameters over an of mod пиз ad ere 
ing how tese stall differences change the 
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Multipart feature: A vector feature, usually of polygons, where multiple, separate geographic enti- 
es are grouped and teated as os, and associat wah one row m tae associated а рше 
Multispectral: An image, film. or system that records data collected from multiple wavebands. 


Malti er architecture: A database management sytem design where there are multiple levels of 


Nadir point: The point directly below the aircraft, usually near the center of an aerial image. 


NAD27: Nath American Datum of 1927, the ajusment of ong baseline surveys 1o establish a net- 
work Of standardized horizontal position in the early 20% century. 

NADE; North American Daum of 1983. The successor 1o NAD27. using approximately an order 
‘of magnitude more measurements and improvements ш залута edel and computer 
pose? The current этно of standard horizontai postions for отв America. 


NASS-CDL: National Agricultural Statistical Service Crop Data Layer, annually produced raster 
data sets of crop categories for United States farmland. 


NAVD29: North American Vertical Datum of 1929, an adjustment of vertical measurements to 
establish a network of heights in the early 20th century. 


NAVDSS: North American Vertical Datum of 1988, the successor vertical datum to NAVD29. 


NATRE2022: North American Terrestrial Reference Frame of 2022: The successor datum to 
‘NADS3(2011), 


Neale: A line containing all elements that make up a map. 


‘lghborhood operation A spatial operation where the output locaton, ares, or extent comes from. 
‘operations on an мез larger than, and usually adjacent to the input extent 


Network: A connected set of line features, often used to model resource flow or demand through 
real-world networks such as road оё ver systems 


Network center: A location on a network tbe provides ог requires resources. 


NLCD: National Land Cover Data set, a Landsat Thematic Mapper (TM) based classification of 
landcover forthe United States. 


NOAA: National Oceanic and Atmospheric Administration, the U.S. goverment agency that over- 
sees the development of national datums. op 


Node: An important point along a line feature, where two lines meet or intersect. 


Node snapping: automatic movement of digitized points o a nearby node when the point location is 
‘within а specified snap distance. 


North arrow: Graphic, usually an arrow, that show the direction to geographic, magnetic, ог grid 
north on a map. 


Northing: The axis in the approximately north-south direction in UTM and other standard coordi- 
mate systems. 


Normal forms: A standard method of structuring relational databases to aid in updates and remove 
redundancy 


N-tuple:A group of attribute values in a database management system. 
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NWI: National Wetlands Inventory data compiled by the U.S. Fish and Wildlife Service over most 
‘of the United States. These data provide first-pass indications of wetland type and extent 


Object: See cartographic object 


Object-oriented data model: A data model that incorporates encapsulation, inheritance, and other 
object-oriented programming principles. 


‘Open source software: Computer programs that provide the source code to any user, typically eas- 
ily accessible through a web portal 


‘Operation, spatial: The manipulation of coordinate or attribute data. 


Optical axis: A ray. ly perpendicular to the film or image plane in a camera and parallel 
хобе contr oie ne bri би my be eng of as де per Acton of ает 


‘Ordinal attribute: A variable that contains а ranking. 


‘Ordinal scale; A scale that represents the relative order of values but does not record the magnitude 
‘of differences between values. 


Orthogonal: Intersecting at a 90 degree angle. 


Опоры view: Horizon placement as would bs ses from a чы viewpoint at абау. 
fire ioo enor ti penpectve dori an 


Олара projection: Aia jection wih tn poilon eae on nib dice fo the 


Orthoimage: See orthophotograph. 

Orthometric height: Height measured from the Geoid surface to a point on the surface of the Earth. 

[узинин Ata anh neg pst 

MES TEES 
z 


‘Outer join: A combination of two data tables in a database management system based on a key col- 


‘umn. The output table appends those rows in a second table that match values in tbe key col- 
uma. Null values are placed in joined-table columns from the second table where there is no 
match to the first table. 


‘Overlay: The “vertical” combination of two or more spatial data layers. 
Overshoot: A digitized line that extends past a connecting line. 


Panchromatic: An image, film. or system that records in only one wavelength band, and resulting 
in gray scale (black and white) images. 
Parallax: The relative shift in position of features due to a shift in viewing location. 


Тн nena een: Aspen tht dos mat ek ba doni nerd om tt 


"DOP: Poli Tibi of eon «em of i edt pron балы o be aaa 
geometry hen taking GPS readings PDOPs betwee | and мер most applica- 
ions, and lower is better. 


Perspective convergence: The apparent decrease in inter-object distance as the objects are farther 
‘away. for example, the apparent convergence of tie railroad rails as they recede into the dis- 
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Perspective view: A view on a location that includes some relief or perspective distortion, meaning. 
tbe location of objects may be distort er late distance the camera vais consi 
ls. 


Pixels: Picture elements that make up an image, these are the individual grid cells that record or dis- 
play a brightness or color in an image. 


Plan curvature: Terrain curvature along a contour 


Planar topology: The enforcement of intersection for line and area features in a digital data layer. 
Each line crossing requires an explicit node and intersection. 


Plane surveying: Location surveying methods suitable under tbe assumption that the surveyed 
‘lands form a planar surface, ie.. that distortions due to the Earth's curvature may be ignored. 


Platten; The lat back portion ofa fl camera against which the film rests while an image is col- 


Plumb bob: A weight on a string held freely to determine the local vertical direction. 


Polat mode digitizing: Manual digitizing in which the operator must press a mouse bunion or otber- 
Nic indica when a pount shouldbe sample е 


Pointer: An address stored in а data structure pointing to the next or related data elements, Pointers 
are used to organize data and speed access. 


Polygon: A closed, connected set of lines that define an area. 


Polygon inclusion: An area different in some characteristic from the recorded attributes of the poly- 
топ, but not resolved. 


Pot tes techn eene ed d eee ERES Vedi ae ites 
bribe area ef иту» н ii iy oa 
КАШИ cad loe DOR: cc тер mest E LES pontem eckson” 
Freckle: Tho epesablis of a motos or procen. 


Primary key: A cow or ows in a relational database table that t selected as the key, and dut 
nt ie ror a he ae iá 


Prime meridian: See Greenwich mendian 
Profile curvature: Terrain curvature in the direction of steepest descent. 
Projection, see Map projection. 
Proximity function: See buffer. 


Public Land Survey System (PLSS): A land measurement system used in the westem United 
States of America to unambiguously define parcel location. 


Pyramiding, Raster pyramiding: Building rasters, often images, of successively coarser spatial 
resolution within an image. primarily to allow faster redraws when panning and zooming. 


QGIS: An open-source GIS. 


Quad-trees: A raster data structure based on successive, adaptive reductions in cell size within a 
data layer to reduce storage requirements for thematic area data. 


Query: Requests or searches for spatial data, typically applied via а database management system. 


Radial lens distortion: The displacement of objects in an image due to small lens imperfections, 
usually radially inward or outward. 


Random sample pattern: A sampling patem where sample location is determined by a random 
process 
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Range distance: A measurement between locations when positioning, usually referring to satellite! 
receiver distances ш GNSS positioning. 


Range pole: A pole used in surveying to raise a GNSS antenna, survey prism, or other survey 
instrument above the ground. Range poles are often used in GNSS data collection to raise an 
‘antenna and thereby obtain better PDOPs, and improved accuracy. 

Raster data model: A re cer. te Usually square cells are 

del regular rid eel approach to defining space square. 


Real time differential correction: GNSS positioning which relies on а radio link and extemal posi- 
tioning measurements о correct major GNSS errors in real time, and provide instant 
improvements in accuracy 


Real time kinematic positioning (RTK): A form of real time differential correction of GNSS posi- 
tions. 


Record: А collection of attributes stored fora specific instance of an entity. 
Reference frame: A well-defined, usually ellipsoidal, surface, that serves as а basis for our datums. 


Registration: The conversion or assignment of coordinates from а non-projected coordinate system 
Етрен = 


Relations: See relational table. 


Ratna анги: Ast of operations on databace tables specified by EF: Codd fr the consistent 
‘manipulation of data in a database. ы 


Relational table: A data table in a relational database management system. 


Relief displacement: Apparent horizontal distortion of features due to height differences relative to 
the nadir point ш a vertical aenal image. 
Remote Sensing: Measuring or recording information about object or phenomena witout contact- 
ig them 
Resampling: The recalculation and assignment of cell valves when changing cell size andlor orien- 
Tation ofa raster grid. 


RMSE: Root Mean Square Error, a statistic that measures the difference between true and predicted 
‘data values for coordinate locations 


Rubbersheeting: The use of polynomial or other nonlinear transformations to match feature geom- 
у. 


Run length coding A compression method used to reduce rage requirements for raster data ses 
Value and number of sequential occurrences are stored 


Schema: A compact graphical representation of a database conceptual models, entities, and the rela- 
tionships among them. 


‘Scope: The spatial extent of input for a spatial operation. 


‘Secant lines: Lines of intersection between a developable surface and a spheroid in a map projec- 
tion. 


Selection operation: The identification of 3 set of objects based on their properties. 
‘Semi-major axis: The larger of the two radial axes that define an ellipsoid. 
Semi minor axis: The smaller of the two radial axes that define an ellipsoid. 
‘Semivariance: The variance between values sampled at a given lag distance арап. 
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‘Server: А computer or а program component that stores data, and provides subsets of data in 
response to requests. 


‘Set algebra: A method for specifying selection criteria based on comparison operators less than. 
‘equal to. greater than and perhaps others. 


‘Shaded relief map: A depiction ofthe brightness of terrain reflection with a given sun location, 
Sidelup: Edge overlap of photographs taken in fightlines 


Singlepart feature: A vector data layer where every feature is individually represented by a point, 
Tine, or polygon. and there isa corresponding row in the attribute table for each feature. 


Shutter: A system for controlling the time or amount of light reaching a detecting surface. 
Skeletonizing: Reducing the width of linear features represented in raster data layers to а single cell. 


Slvr: Small, spurious polygons at the margins or boundaries of fere polygons that are an arte- 
fact of imprecise блар or overlay: 

Spe The change in craton over acbange i lation, ly eared ovar some ed iter, 
‘a the change ia height between two points 30 meters apart Slope о 
picem slopa or s a depu aie тараа ш шын "Т" 


SLR, Satelite Laser em for measuring position. particularly upit and subsidence, 
ion Bari Smile er redi Те = 


Susp distance: А distance threshold defined in digiizing or otber pal analysis. Point fetus, 
ies e wane tape р е LS edm te cmq aaa Da. 


Snap tolerance: See snap distance. 


Samping; Aunt line joias dring verte Шш or layer overlay Nodes or vertices are 


Spaghetti data model: Vector data model in which lines may cross without intersecting. 


patia datn mt А woy of орыш wel dota laa dana ойор he concept 
scription and relationships among spatial objects. 


Spatial object: A digital representation of a physical object or phenomenon- 
Spatial operation: A logical, mathematical, selection, or other spatial proces that transforma 
ren did = 


Spatial resolution: The smallest feature that can be identified in a data set. 

Spectrum: see electromagnetic spectrum. 

Spherical coordinates: А coordinate system based on a sphere. Location on the sphere surface is 
defined by two angles of rotation in orthogonal planes. The geographic coordinate system of 
latitude and longitude isthe most common example ofa spherical coordinate system 

Spheroid: A mathematical model of the shape of the Earth, based on the equation of a sphere. 


Spirit leveling: Ал early leveling survey technique in which horizontal lines were established 
between survey stations, and relative height differences determined by measured marks on 
leveling rods. 
Spline: A smoothed line or surface created by joining multiple constrained polynomial functions 
SPOT: Systeme Рош ' Observation de la Terre, satellite imaging system providing 10 to 20 meter 
‘resolution images. 
SQL: Sintered Query Language, а widely adopted эн of commands used о manipulate relational 
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‘SSURGO: Fine resolution digital зой data corresponding to county level зой surveys in the United 
States Produced by the Natural Resource Conservation Service. 

‘Standard parallels: Lines of intersection between а developable surface and a spheroid in a ma 
projection. 

STATSGO: Coarse resolution digital soil data distributed on a statewide basis for the United States 
Most ofien derived from aggregation and generalization of SSURGO data. 

‘State Plane Coordinates: A standardized coordinate system for the United States of America that is 
‘based on the Lambert conformal conic and transverse Mercator projections. State plane 


zones are defined such that projection distortions are maintained to be less than 1 part in 
10,000. 


Stereo pairs: Overlapping photos taken from different postions but of substantially the same area, 
th be goal of wang parallax o interpret height ferences within the overlap area 
‘Stereographic projection: А map projection with the projection center is placed м the antipode, the 
Диде spa sel pel rn psn sean oat N p 
phe 

Stereophotographs: A pair or more of overlapping photographs that allow the perception of three 
mensions due toa perspective d 


‘Stream mode digitizing: Point data collection via manual digitizing where tbe distance or time 
interval between sampled locations is fixed. This removes the need fora button press by the 
manual operator during digitizing. 

Structured Query Language (SQL): A standard syntax for specifying queries to databases 

Survey station: А postion occupied, and from which measurements are made, during а land survey. 

Systematic sample: А sampling patera with a regular sampling framework. 

‘Terrestrial reference frame: The set of measured points and ther calculated coordinates that are 
‘used to define а specific, or realized, реодейс datum. 


Thematic layer: Thematically distinct spatial data organized in а single layer, e.g., all roads in a 
study area placed in one thematic layer, all rivers in a different thematic layer 

‘TIFF: Tagged Image File Format, a widely-supported image distribution format. The Geo-TIFF 
‘vanant comes with image tegistation information embedded. 

TIGER: Topologically Integrated Geographic Endcoding and Referencing files, a set of structures 
‘used to deliver digital vector data and attributes associated with the U.S. Census, 

Time geographic density estimation: A method of estimating the location of a moving object from 
time sequence of observation. The method uses time intervals between observations along. 

‘estimates of average, maximum, and or minimum speed to estimate a probable occupa- 

tion region. 

‘TIN: Triangulated Irregular Network, a data model most commonly used to represent terrain. Eleva- 
tion points are connected то form triangles i а network 


TM: Thematic Mapper, a high-resolution scanner carried on board later Landsat satellites. Provides 
information in the visible, near infrared, mid infrared, and thermal portions of the electro- 
magnetic spectrum. 

InMIPS: An image processing and GIS software package produced by Microimages. Ine., of Lin- 
colo, Nebraska. 


Topology: Shape-invariant spatial properties of line or area features such as adjacency. contiguity. 
"and connectedness, often recorded in а set of related tables. 


‘Transverse Mercator projection: A common map projection based on a transverse cylinder 
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‘Transaction manager: A component of a database management system that processes requests 
from clients. and passes them to a server. 
Traverse: А series of survey stations spanning a survey. Traverses are closed when they retum to 
the starting point, and open when they do not. 
Trigonometrie leveling: Measurement of vertical positions or height differences among points by 
the collection of vertical angles and distance measurements 
Survey: Horizontal surveys conducted ia aset of interlocking angles, thereby pro- 
Tides pe рту to cach survey рош. This thd providet inherent intra 


тараны onan: 

‘nary operation: An operation that has only oe ipt 

Vnderiboot digitizing emor in which a ine end fl short of an intended connection м ойт 
Tw end or epee 

Valen; The тебі combination of wo spatial data layers, typically ovar бе combined extents of 
ie data layers and preserving data fom both ayers o 

United States Survey Foot: An offi distance wie for survey measurements in th United States 
of Americ tat sy diferent in ng Бош the teeta definition ofa foot 

USGS: United States Geologica! Survey -the U.S. government agency responsible for most civilia 
коше mapping ana pata data developement 


Universal Transverse Mercator coordinate system. a standard set of map projections devel- 
оре by the US. Мами and widely adopted for coordinate specification over regional 
Study areas A cylindrical projection is specified with а central mendian for each six degree. 
wide UTM zone. 


Variable distnace buffer: A buffering variant where the buffer distance depends on some value or 
evel of бее ише. е 


Vector ама model: A representation of spatial data based on coordinate location storage for shay 
defining points and associated attribute information. = 


‘Vertical datum: A reference surface against which vertical heights are measured. 

Vertex, vertice: Points used to specify the position and shape of lines. 

Very Long Baseline Interferometry (VLBD: A measurement system collection signals from dis- 
tant quasar stars that allow milimeter-level measurements of position through time, И forms 
the foundation of our positional measurement system. 


Virtual reference station (VRS) network: A set of rather closely-spaced GNSS base stations plus 
communication links that simplify and standardize real-time differential correction within а 
region. 

WAAS: Wide Area Augmentation System, a satellite-based transmission of correction signals to 
improve GPS pion eter largely боор) he removal of ionsopheni and me 


UIM 


Wavelength: The distance between peak energy values in an electromagnetic wave. 

WGS84: World Geodetic System. an Earth-centered reference ellipsoid used for defining spatial 
locations in three dimensions. Very similar to GRS80 ellipsoid. Commonly used as a basis 
for map projections. 

Zenith angle: The angle measured between a vertical line upward fom a point on the Earth and the 
line from that point to the Sun- 
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Appendix B: Useful Conversions and Information 


Horizontal angles in a projected coordi- 
nate system - Azimuth on a flat 
map: 


1 meter = 3.28083333333 U.S. survey 
feet 

1 kilometer = 1000 meters 

1 kilometer = 0.62137 miles 

1 mile = 5280 feet 


Area 

1 hectare = 10,000 square meters 

1 square kilometer = 100 hectares 

1 acre = 43,560 square feet 

1 square mile = 640 acres 

1 hectare = 2.47 acres 

1 square kilometers = 0.3861 square 
miles 


1 centimeter distance on 


Angles map equals a distance 


1 degree = 60 minutes of arc 
1 minute = 60 seconds of arc 


50 meters 
decimal degrees = pend 
degrees + minutes/60+seconds/3600 250 meters 
180 degrees = 3.14159 radians $00 meters 
^ radian = 57.2956 degrees 3000 wasters 


‘Spherical angles on a globe: 


1 inch distance on a 
map equals a distance 
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Trigonometric Relationships. 


sin (а) = AIH 
cosine (а) = В/Н 
cotangent (a) = B/A H 
secant (a) = HIA A 
cosecant (o) = H/B 
Coordinate Geometry 
Coordinate х= хә dx 
geometry (COGO) we ye oy 
9, = Leos (0) 
XX dye Lain (@) 
ub — 
XX L cos (0) 
XXe Lin 0) 
X 
1f we клон ne location of a pont. x, yn. ond have measured the 
сатат end distance fo other pont Xy yy What ore the coordinates 
for he unknown pont, Xu Yu? 
Suppose ж, = 12. yy D-68, ond azimuth = 242° 
From above, 
m P we con cla ra He omm 
Елар 9.242 - 180 (see figure) 
x =n = xe 
nin 


зо 

dy + 6B-cos(é2) = 32 
dy» 68. sn(62) « 60 
xy = 12-6 =6 
y0*3-32 «-02 
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Conversion Between Ellipsoidal and 3-D Cartesian Coordinates. 


3-0 Cortesen from known lottude(@).Songtude (AJ 


z 
mam pole ст ет тарх эш. Ь = earth эту meer ot 


um 
РЕТ 
емсе) om D 
знанне) 


У Latitute. олде trom known 3-0 Cartesian 
elo Peeled os owe 
Haie. vem) езш” 
VS i rae «tay 
ce ERD 
ur = 
Conversion Between Spherical and 3-D Cartesian Coordinates 
[эне 3-0 Cortesen төт кит лиде {у}. tongitude Up) 
rey ета пажа та тау! аз pers 
ж-ке con) co 71 
Ye = (ren) «гє Pp) amis) 
a an) 
arde. ngs trom arom 30 Cortesen 
ЕССЕ 
РН 
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Greer 
pn 


Mere 2e terse ot фло 


Conventions typically adopted for Earth-centered coordinate systems, 
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Distance between two points, Р (xp. yp) and W (xy. Ya) 


Useful relationships for oblique triangles: 


с 
^ 
в 
Low of sines Low of cosines 
A B c A? = B? + C? + 2BC cosa 


B? = AŽ + C? + 2AC cos 


C? = A? + B? + 2AB - cosy 
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Deflection of Curvature 


A horizontal t pt. A в 


‘Surface Arc Distance vs. Planar (map) Distance 
А AB ie horizonta storce, Аюв В 


4 
5 
2 
S| 
= 
E 
5 
3 
3 AB = R- sin(0) 
J Ыы 
а 
The ongle 8 is 


expressed in radians 
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Appendix C: Answers to Selected Study Questions 


Chapter 1 


12: I recently used Google Map (htp:/ 
maps. google convmaps"tab=w) to help 
plan а trip to Granada, Spain. Data col- 
lection consisted ofa search with the 
keywords Granada, Hotel, and Spain. 
The analysis consisted of the hotel qual- 
ity ranking. location relative to sites 1 
‘wanted to visit, and cost. Communication 
involved sending an image and map to 
friends. 


d 


14: GIS software differ from other software primarily in tracking geographic 
coordinate location. ting these locations to attribute data, and storing and 
processing large quantities of data. While many softwares are designed to 
store and analyze large volumes of data (eg. video editing), and some other 
Softwares focus on coordinates (e.g... computer assisted design programs for 
three-dimensional objects), GIS records coordinates that ae tied to real 
physical locations. Coordinates are defined relative to a physical origin, usu- 
ally some point on the Earth surface, or the near the center of the Earth, and 
stored in the computer. Sets of points are combined to characterize ће loca- 
tion and shape of geographic features, and non-spatial attributes are associ- 
мей with these features. 


16: By our definition in this chapter. paper records and maps are not a GIS, 
because they are not computer based. However, they do serve in our collec- 
tion, storage, analysis, and output of spatial data and information, so some 
‘would argue that they are a GIS, just an extremely low technology version, 
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Chapter 2 


2.2: Our multiple levels of abstraction from the physical, "real" world usually 
include a data model, data structures, and machine code. The data model 
describes the real-world objects with a subset of simple objects and relation- 
ships. Data models typically encompass our mental image of how the real 
‘world entities are connected, shaped, or related. These models may often be 
illustrated by box and arrows diagrams. Data structures are how these objects 
are organized in a computer, for example, what parts go in what fles, or how 
the files are linked on to another. Machine code are the 0's and 1's used to 
store information. 


2.4: a) interval/atio, b) nominal, c) nominal, or ordinal (if read along a brightness 
gradient), d) ordinal, e) nominal, )intervalratio. 


26: a,b, f 


а) 082525625, b) 2.717896524, c) -1.94088466, 
d) 0.24064227, e) -72.192682, 0128.93670 


240: 
Point DMS [Decimal Degrees 
1 364517 | 3675333 
2 114582” | 114.9672 
3 85197 85.31816 
4 140033 | 1400917 
5 275/3000 | 27500001 
€ 05943 0.99528 
7 18271922" 182.32278 
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Are Angle | Arc 
Point | (гес ех) | Radians | Distance (m) 
10 0.01745293 1113188 
0.01666667 | 00002909. 18553 
0: 00027778 | 0.00000485 309 


1 
2 
3 
4 321611111 |0.056131728| 3580138 
5 
6 


05 |000872665] 556594 
00125 00002181677) 13915 


2.14: а) 558.48 km, 4,753.72 km, 9,523.1 km 


2.16; Topology is the study of spatial relationships, and it is important in GIS 
because topological vector data structures have certain positive properties. 
‘Topological relationships such as adjacency, connectivity, proximity, and 
overlap are important in structuring and analyzing data, and are often helpful. 
in ensuring data quality. Many topological characteristics are invariant to 
‘warping or bending spatial features. This is important because we often warp 
spatial data through map projections. 


a) 1, b)2, 92. 1. e)2 
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222:Mixed cells may be a problem under several conditions, for example, when 
there are very different values for materials within the cell, or when we are 
interested primarily in one factor that is a minority presence in а cell but not 
recorded. Mixed cells may be addressed by decreasing the cell size, by care- 
fully developing the assignment rule for cell values when there are mixed 
constituents, or by recording multiple attributes for each cell, including the 
identity and proportion of values in each cell. 


224: Water are E, H, F, and J. 
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226: 
21211[2 
= ир null | 175 
HMHE ® 
shasta null | 35 
a[o[b[a 
b 
a|b|c|b „ B 
ial c[c|c ES 
elelele 
228: 
i сало count One-to-many 
One-to-one $ [1 attribute table 


attribute table 
(rows fiest start 
‘upper-ieft corner) | 


i 


: 
2 
3 
3 
D 
io 
4 


cei-ID| count 
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:30: An object data model defines "natural" objects, from the point of view of 

the model designer, that encompasses spatial and attribute properties, as well 
as functions or operations that may be specific to that object. Rather that 
breaking data into thematic layers, components from many themes may exist 
within an object. Objects may relate to other objects through specific or spe- 
cialized, unique correspondences or connections. 


232: a) 1b) 10111) 100000000 4) 100 e)1011 1010 g) 11 h) 10100 


234:4)5 b)1 DIS 945 e)13 DII. g)129 h)255 


2.36; We compress data when data volumes are too large, particularly for raster 
data sets. Cells are recorded for each location in a raster area, and gigabytes 
to terabytes are often stored. Vector data sets typically record shape-defining 
locations, and only where features of interest occur, for example, a road line, 
‘This contrasts with а raster representation which records a set of cells for a 
road, plus cells for the surrounding area where there is no road. 


238: Run length codes, by row are: 
a 
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Chapter 3 


323)6372400 b)6356500 €) 6,344,647 


3.4: An ellipsoid is a solid shape based on the 
rotation of an ellipse. An ellipse is a near 
circular shape, defined by the equation at 
right, where x and y are the center ofthe 
ellipse and г and г; specify how large and 
flattened the ellipse is. An ellipse becomes 
a circle when r = гу, and a spheroid is 
solid based on the rotation of a circle. 


Ellipse equation: 


3.6: A geoid is usually defined as gravitational equipotential surface chosen as 
‘our base for measuring heights. It is an approximately spherical surface for 
which the force of gravity is a specified constant value. An ellipsoid is a 
mathematically defined surface, while a geoid is measured, and represents а 
natural force. The surface of the earth is also a measured surface, but doesn't 
‘correspond to a gravitational surface because geologic forces have pushed 
surface materials above and eroded them below any given equipotential 
value near the earth surface. We have measured the geoid both from near-sur- 
face instruments called gravimeters that measure the gravitational force, and 
by gravity effects on satellite motion through space 


: Magnetic north is at the point where lines of magnetic attraction converge, 
and a weightless magnet, if suspended in a frictionless media, would point 

straight down towards the center of the Earth, Magnetic north is currently 

located near Greenland. 

‘The geographic north pole is the northem intersection of the Earth's axis of 
rotation with the Earth's surface. It is located in the Arctic Ocean. 


3.10: Multiple datums exist because we have improved datums through time, and 
because we develop different datums for different purposes. Datums are 
required for measurements. and so most governments estimated datums 
when a sufficient number of points were surveyed. Additional points with 
improved methods will led to subsequent estimations, or versions, of national 
datums, in most cases with higher accuracies. Satellite and other measure- 
ment capabilities developed in the second half of the 20th century added 
‘global datums, increasing the number of available datums for most locations. 
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3.12: Sea level is no longer used as a reference height because sea level is rising, 
зо the mean height would depend on the length of the measurement record 
and is not stable, there are local variations due to persistent currents, changes 
in temperature, or changes in salinity, and sea level varies on a 19 year tidal 
cycle. Technologies have developed such that we can quickly and easily 
‘measure accurate height differences smaller than these sources of sea level 
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322: MHW = 4.99 above gauge 0, NAVDSS at gauge is 0.43 above gage zero. 
Зо MHW is 4.99 - 0.43 above NAVDSS, which is 4.56, Hospital should be at 
30+ 4.56, or 4.56 ft 


324: A developable surface is a mathematical, geometric surface onto which a 
‘points are projected from a spheroid or ellipsoid. This developable surface 
may be mathematically “unrolled” to depict a flat map. Planes, cones, and 
cylinders are the most common developable surfaces. 


3.26: As of December 2015: 
Denver, Colorado: NADS3(2011), NAVDSS 
Latitude and longitudes are 39 45 14.29S88(N) 104 53 00.96531(W) 


Loma East, California: NADS3Q011), NAVDSS 
Latitude and longitudes are 3240 14.00209N) 117 14 27.75333(W) 


Austin CE, Texas: NADS3(2011), NAVDSS 
Latitude and longitudes are 30 16 48.04361(N) 097 44 1630349) 


3.28: The great circle distances are: 
Denver to Loma East: 1,359 km. 
Denver to Austin CE: 1.239 km. 
Austin CE to Loma East: 1,868 km. 


330: The UTM coordinate system defines map projections for all portions of the 
globe, Areas between 80° S latitude and 84° N latitude are divided into бо 
‘wide zones, each zone running from the equator to the northem or southern 
limit. Separate transverse Mercator projections are fit to each zone. Negative 
zone values are avoided by specifying false eastings and northings, coordi- 
mate values added to intermediate projection coordinates. 
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332: Benin, Israel - Transverse Mercator, because both are narrow with the main 
territorial axes are north-south: 
Bhutan - Lambert conformal conic, because they are relatively narrow and 
have main axes oriented east-west. 
Slovenia - either Lambert conformal conic or azimuthal, because although it 
is has а slight east-west elongation, the shape is somewhat round, and so 
could be well represented by either forms. 


3.34: The Public Land Survey System (PLSS) is a systematic subdivision of land 
carried out in the U.S. for the purpose of uniquely identifying property 
‘boundaries. Principle meridians and baselines are established, and township 
and range lines surveyed parallel to these at 6 mile intervals. The township’ 
range grid is further subdivided into 1 mile squares, in tum subdivided into 
smaller units, The PLSS is not a coordinate system. 


Appendix C: Answers to selected questions 685. 


Chapter 4 


42:2,c, ande 


44:a) exaggeration, b) simplification, d) simplification, c) omission 


4.6: A computer screen is now the most common map media; millions of maps 
are rendered each hour through applications like google map and mapquest. 
Paper is the second most common media, and is most used when a hardcopy 
fom is required. 


large scale map typically shows more detail because each feature is drawn 
larger. and there is more opportunity to show variation in shape. 


4.10: Complete the following table that shows scale measurements and calcula- 


tions, 

Ground distance [Corresponding map اا‎ Map Scale 
and units tance and units 

17120 kilometers 1685 inches 1:40000935 
234 kilometers 117 centimeters 1:200000 
amies | эзе | nonem 
1020 meters | 1855 centimeters | 1:5500 
32008 miles | 1024 inches | 1:2,000000 


12: a-undershoot: b-undershoot: c-psuedonode: d-overshoot: e-undershoot; f- 
missing label: g-overshoot. 
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4.14; Snap distance 5 (top) and 10 (bottom) 


[ 


Sunt rodus 


1 unit 
rodus 


unit rodus 


10 unit” 
rodus 


4.16; A spline is a line smoothly fit through а set of points. Splines are used to 
increase vertex density without substantially slowing the digitizing process, 
particularly for smoothly-curving features, such as river meanders or wind- 
ing roads. Splines fit piecewise polynomial functions while imposing smooth 
join points. 
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4.18: Manual digitizing involves fixing a map or displaying a scanned image, and 
‘manually positioning a pointing device to indicate the location of each node, 
vertex, or other shape-defining coordinate. Digitized data are in vector form. 
Scan digitizing uses a machine to record differences in colors or brightness. 
for a map or image document, usually into a raster grid. Lines, points, and 
areas are defined by some thresholding technique, and lines or points may be 
thinned and converted to a vector form, as needed. Manual methods have the 
advantage of low costs for small maps, inexpensive equipment requirements, 
feature interpretation by humans when using substandard maps, and rela- 
tively little training. Scan digitizing is inexpensive for large numbers of very 
detailed maps, may be automated, and may be more consistent. 


4.20: Map registration fixes a map or image to a ground coordinate system so that 
the coordinates of any point in the media may be determined easily. The pro- 
cess consists of identifying control points that are visible in both the image/ 
map and on the ground, collecting coordinates of these points in both the 
image/map system and the projected "ground" coordinate system, fitting a 
system of transformation equations to the coordinate data sets, and applying 
these transformation equations to the image to convert it to the projected 
ground coordinate system. 


422: An affine transformation uses a system of linear equations to estimate the 
ground easting (E) and northing (N) values from image x and y values. The 
‘equations are ofthe form: 


n Ne yt Л ЫЛ 


This is a linear transformation because the x and y variables are not multi- 


plied together or raised to a power larger than 1, by definition the equation of 
a straight line. 


424: The average positional error is likely to be the same or larger than the 
RMSE. The RMSE is usually minimized, or closely related to a minimized 
‘quantity when statistically fitting the coordinate transformation. If we col- 
lected a representative sample, we expect the RMSE to be approximately 
equal to the average error. However, if our sampling was inadequate ог 
biased, often it is in areas where we have difficulty identifying good control 
points, and hence our RMSES tend to be larger in these locations. 
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424: Transformation b is the most likely to have lowest average error at inde- 
pendently measured points. It depends on the distribution and number of con- 
trol points, but in most cases higher order polynomials overfit and while 
exhibiting lower RMSE values, they have larger errors. 


428: Metadata are the data about data. They describe the extent, type, coordinate. 
system, lineage, attributes, and other important characteristics ofa spatial 
data set. Metadata are required to evaluate the adequacy of a data set for an 
intended use. 
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52: GNSS is based on range distance measurements from multiple satellites to “tri- 
angulate" a location. Orbiting satellites transmit radio signals along with precise 
positioning and timing information. The current distance between a satellite and 
a receiver isa range measurement. A GNSS receiver combines multiple, simul- 
taneous range measurements to estimate location in near-real time. 


54: Typically 4 satellites are required for a 3-dimensional fix, although a fix may be 
determined under some assumptions with data collection from 3 satellites over a 
short period of time. 

5.6: GNSS data range in accuracy, from sub-centimeter for the highest accuracy 
using carrier phase methods, to tens of meters using real-time C/A positioning. 
Accuracies are highest when using high quality receiving systems in flat terrain, 
‘with no buildings, trees, or other structures to block views of the sky. Accuracies. 
also improve when satellites are widely spaced. 


58: Figure d depicts the lowest PDOP. with the widest distribution of satellites, 
closely followed by figure b. Figure a has the highest PDOP. with the tightest 
distribution 


5.10: Differential positioning is based on the simultaneous measurement of GNSS 
signals at both a known, base location, and at unknown roving stations. The 
small errors in range measurement may be calculated for each position measure- 
шем atthe base station. These range errors may be applied in reverse for corre- 
sponding rover data, thereby improving the accuracy of position measurements, 


5.12: Dual frequency receivers primarily reduce ionospheric delays, and hence 
uncertainty in position. They may help somewhat with atmospheric delays, but 
typically much less than with ionospheric delays. 


5.14: GNSS accuracy typically decreases as terrain becomes more varied, or when 
canopy or buildings obstruct a portion of the sky. Positional accuracy decreases 
because sub-optimal constellations of satellites are more likely to be observed. 
Satellites are in closer proximity, and measurements are less independent, and 
hence to not reinforce each other to improve accuracy. 


5.16: WAAS is the Wide Area Augmentation System, a real-time differential correc- 
tion system designed to aid navigation in U.S. civil aviation. Correction factors 
эге derived from a nationwide network of contol stations and broadcast from a 
‘geostationary satellite located over the equator. The system is designed primarily 
for aviation and related uses in North America. 
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5.18: COGO stands for coordinate geometry. COGO is the calculation of coordinates 
based on angle and distance measurements. 
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62: The electromagnetic spectrum is the range of electromagnetic energy fre- 
quencies observed. Broadly, this spans from 0 to infinity, however we are 
‘most interested in the subset of primary frequencies emitted by the sun. Spe- 
cifically, we are interested in the ultraviolet through infrared portions of solar 
radiation. from 0.01 through 1000 um (1,000,000 um equals 1 meter). Princi- 
pal regions of interest are the visible (0.4 to 0.7um, approximately equally 
split in the blue, green, and red portions of the spectrum), the near infrared 
(0.7 to 1.1 um), and the mid infrared portions (2.5 to 8 рт). Radar wave- 
lengths are important in remote sensing. most often generated from a device, 
and range from 0.75 cm to 1 m. 


64: Film is a layered sandwich of emulsions on a polyester base material. The 
emulsion is sensitive to light, and reacts to darken in a measure proportional 
to the amount of light (exposure) the layer receives. Different emulsions are 
sensitive to different spectral regions. Panchromatic film is typically sensi- 
tive to visible wavelengths, from 0.4 to 0.7 um. Color films typically contain 
three dye layers, sensitive to the blue, green, and red wavelengths (normal 
color), or green. red, and infrared wavelengths (color infrared film). Spectral 
reflectance curves plot the sensitivity versus wavelength. 

Digital cameras are similar, except that light is typically split by wavelength 
and directed to separate receptor electronics, one each for each portion of the. 
spectrum observed. Light generates a voltage or current proportional o ће 

light energy, and in this way an image is formed. 


66: The dimension of a side will be approximately 68.6 meters. 


68: You should set a flying height of 170 meters. 


6.10: The most common band combination is a blue-green-red set that corre- 
sponds to what the human eye observes. Another common combination is a 
green-red-near infrared set, which provides better vegetation discrimination. 


6.12: Distortion magnitude varies with mapping cameras, depending on terrain, 
tilt, camera characteristics, and scale. For vertical photos, typically defined as 
those with camera axis tilt of less than 3 degrees, errors are typically between 
10 and 70 meters over moderate terrain. Errors may be reduced to a meter or 
less by applying a full photo orthocorrection, a process that analytically 
removes most tilt and terrain distortion through the three-dimensional geo- 
metric analysis and transformation. 


6.14: a) tlt, b) terrain, с) cameralens 
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6.16: Stereo photographic coverage is the intentional overlap of sequential photo- 
graphs in a flight line (end lap) and photos in adjacent flight lines (side lap). 
Overlap provides views of the same set of objections from two different loca- 
tions. These "perspective views" take advantage of a phenomenon called par- 
allax to reconstruct three-<dimensional positions from two-dimensional 
images. 


6.18: Terrain distortion is removed by applying inverse equations that describe 
the magnitude of trrain-caused distortion. Three dimensional objects that 
are projected onto a two-dimensional plane are shifted horizontally when 
they are at different heights. This shift is also dependent on the angle at 
which the objects are viewed. We may remove the distortion by measuring 
the height of each point and knowing the viewing angle from the camera 
location to each point. 


620: 


6.22: Photointerpretation is the process of converting images into spatial informa- 
tion, typically by an experienced human analyst, or photointerpreter. The 
photointerpreter uses size, shape, color, brightness, texture, and location to 
assign or identify characteristics to features of interest. 


6:24: The four systems, from Landsat ETM+ through Quickbird representa range. 
of resolutions (from 30 m through 0.6 m), spectral ranges (full color through. 
near and mid infrared), per scene coverage (from tens of thousands of square 
kilometers through a few tens of square kilometers), and more the two week 
to less than two day repeat times. Finally, costs rise markedly along this gra- 
dient. Although near the end of its functional life at the time of this writing, 
the ETM+ data were available for hundreds of dollars for a full scene, while 
the higher resolution data were from tens to hundreds of thousands of dollars 
for an equivalent area. 


626: Image types are selected if they measure the phenomena of interest to the 
required level of spatial and attribute accuracy, are within the technical capa- 
bilities of the organization, have an acceptable probability of successful data 
collection, and fit with the available budget for acquisition and processing. 
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Chapter 7 


72: Do the data provide the required information, for the required area, at the 
necessary level of categorical and spatial detail, and at the accuracy needed 
for the intended use? 


74: Edge-matching is the process of ensuring consistency in features across the 
edges of mapping projects, areas, and physical maps. When adjacent areas 
are mapped at different times, by different methods, or by different people. 
there may be incongruent features on either side of the mapping boundary. 
Roads may not match in location or type. rivers may end abruptly, or the veg- 
tation or elevation change in an impossible manner. Edge-matching 
attempts to resolve these differences, and if possible, remove errors across 
‘mapping boundaries. 
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Chapter 8 


8.2: Database management systems are computer software tools that aid in the 
‘entry, organization, analysis, distribution, and presentation of data. 


8.4: A one-to-one relationship among table means that for every row in one table 
that in some way is matched to а row in another table, there is only one row. 
in the second table that matches. A many-to-one relationship means that one 
row in a able may match many rows in a second table. Note that by match, 
we do not mean completely match. Usually we are using a column in each 
table to match the tables: the rows are considered to match when the match 
column has the same value in both tables. 


: Osel, NumT 


8.8: The eight basic operations are illustrated in the section “Primary Operations” 
in Chapter 8. They are restrict, project, product, divide, union, intersect, dif- 
ference, and join. 


8.10; Sets from OR conditions will have the same number or more members than 
the component conditions. 


8.12: a) Florida, Georgia, Iowa, Minnesota, Wisconsin. 
b) Iowa, Minnesota, Wisconsin. 
с) Alabama, Alaska, Florida. 
Ч) Minnesota, Alaska. 
e) ома, Oklahoma. 
0 Minnesota, Wisconsin. 


8:14; Normal forms are a way of organizing database tables. When followed, they 
optimally structure tables to remove redundancies, efficiently store data, and 
organize data in “natural” groupings that speed analysis and increase flexibil- 
ity. 
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3.18: A functional dependency occurs when knowing the value of one variable. 
defines the value of another variable. For example, if I know a person that a 
person was a German citizen in 2014, then I know that their Chancellor was 
‘Angela Merkel, or if that a person is a Chicago Cubs fan in 2015, then their 
team has the longest active period without winning the baseball World Series. 


820: ID > Size, Color; 
Size-> Color 
Source -> ID, Size, Shape, Color, Age 
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Chapter 9 


9.2: Selection operations apply criteria to features, and identify features that meet 
those criteria. The criteria may apply to spatial characteristics, for example, 
the size, shape, or location of a polygon: they may apply to non-spatial attri- 
butes of the features, for example the value or condition of an attribute, 


9.4: a) B or C; b) A and В; c) [A and B] and not C: d) [B or C] and not [B and С] 


[ SA = 
Ost 
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ЭЗ: Large are white, medium light gray. small darker grey. 


9.10: The modifiable area unit problem arises because statistics for aggregated 
areas depend on the aggregations. We may combine adjacent areas, and cal- 
culate sums, means, medians, and other attributes of the areal units. If we are 
selective about how we aggregate, we may change these statistics solely by 
changing the aggregation units. This is the modifiable areal unit problem. The 
zoning effect is how aggregate statistics change with zone boundaries. The 
area effect is how statistics change when changing the size of aggregation 
areas 
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9.16: Multi-distance, retain, exterior. 
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918: 


ы ID distance 
1| 250 
2 o 
3| 1500 
4| 50 
5 о 
6| so 
7| 250 
8| 250 


920: The minimum dimension (point. line, or polygon) is chosen because to do 
‘otherwise courts ambiguity. If two lower dimension features are coincident 
with higher-dimension features, it is unclear how the attributes should be 
recorded in the resultant features. For example, if two points fall within a 
polygon, the polygon attributes may be unambiguously associated with each 
point. It is unclear or at best cumbersome to assign both sets of point attri- 
butes to an output polygon. 
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Lover 1 
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9.30: Network models are connected linear graphs through which resources flow, 
orto which movement may be constrained. There may be both source and 
demand features connected to these networks. Networks are different from 
many other spatial models in that movement or occurrence is limited to the 
network, and they often track time-varying 


9:32: Note answer values in cases large address ranges may be off by one address 
wit 


934: 


b) dissolve. 
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Chapter 10 


102: Compatible сей sizes are often required in raster operations because other- 
‘wise the input is often ambiguous. If one input cell is substantially larger or 
mismatched to another, it may be uncertain which input cell value to choose, 


104: 
no - ar] = 

515 

anr Ў 


10.6: C1 = 1, С2 = mull, C3 =0, C12 = 0. 


108: C1=1,C3=1,C4=0,C10=4 


10.10: Any nested operation, something like contismull(layerl, layer2. layer), or 
sqrtabs(layert)). 


10.12: C7 = null, C8 = null, C13 = 1, C16 = null. 


10.14: In most systems, a NULL value is returned when it appears as any input of 
ап operation, unless there are explicit instructions to ignore null values. 
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J.18: А clip function may be implemented in a raster environment by creating а 
lip layer which have cells with a value of 1 wherever data in the target layer 
are to be kept, and 0 cell values corresponding to areas where the target layer 
data is to be discarded. 


1020: The kemel is a high-pass filter. It would highlight local differences in the 
target raster, with values changed little in areas where cell values are about 
the same, and exaggerating values that are higher or lower than their neigh- 
bors. 


1022: High spatial covariance means cells near each other tend to have similar 
values: low values tend to be clustered near low values, and high values clus- 
tered near other high values. 
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112: Digital elevation models are created by a variety of methods. Leveling or 
‘other ground surveys are used to measure relative height differences across. 
profiles, using optical and electronic instruments to measure distance and 
vertical and horizontal angles. Photo-based methods rely on parallax, the rel- 
ative displacement of objects depending on their distance from an observa- 
tion point. Downward looking images taken from aircraft or satellites may be 
‘combined with knowledge of aircraft position and ground surveys to create 
DEMS. Laser and radar measurements from airborne platforms are а third 
‘common method for DEM creation. Retum times are recorded for electro- 
magnetic signals sent form the aircraft and used to calculate terrain height. 
relative to the aircraft. These are combined with precise positioning informa- 
tion for the aircraft and with previous ground surveys to produce accurate 
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ie 


Slope and aspect, four nearest neighbor method. 
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11.8: Slope and aspect, third-order finite difference method. 
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11.10: Slope as percent is larger over most of the possible range. 


Slope (percent) 
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12 


11.14: The formula for contour calculations is: 


dan d; Pr He 
be dure 


‘where d; is the distance from the upper point to the point on the contour, Н, 
and Hy are the upper and lower elevations at known points, and d, is the dis- 
tance between points Н, and Нь. 
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J: A solar zenith angle is the angle measured at an observation point between 
the vertical line, “straight up.” and the sun's location. The solar azimuth 
angle is the angle turned clockwise from geographic north to the sun's loca- 
tion. The solar incidence angle is the angle between an incoming sun's ray 
and the surface normal li 


22: A viewshed for a point is the combination of areas that are visible from a 
point. Viewsheds are used in landscape design to maximize scenic vistas or 
hide powerlines, roads, or other features, and they are used in telecommuni- 
cations to calculate inter-visibility and communication networks. They are 
calculated by tracing rays from a view point to all viewable locations, using a 
DEM to estimate if the angles to all intervening points are lower than the 
angle to the potentially visible point. 
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12.2: The four discussed sampling patters are: 
1) random, where points are assigned x and y locations randomly drawn from 
those in the area of interest, 

2) systematic, where locations are assigned in a fixed patter, e.g., spaced every 
100 meters in the x and y direction from a starting point, 

3) clustered, where points are assigned in a systematic or restricted random man- 
ner from each of a set of starting points, and 

4) adaptive, where local sampling density is related to variability, with more 
samples in more variable areas. 


124: Adaptive sampling will likely give a better estimate of the surface over all 
‘numbers of sampling points. 


126: 
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128: ais 18; bis 12: cis 6: dis 10.25; eis 0; fis 22. 
1240: 

D x Y z 

ж OT 15395191 44787145 | 20406 

И ^ 1] 1512809] 44796473 | 18630 

"8 2 2 | 1482285 | 44721434 | 19921 

5 ШЕ. 3] 146.9068 | 44649786 | 25401 
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12.12: A trend surface interpolator estimates coefficients for an equation globally - 
all data are used to estimate the coefficients for a prediction equation, and apply 
‘across the entire sample region. A kriging interpolator uses estimates of global 
‘and local variation, specifically spatial autocorrelation, to estimate coefficients 
for prediction equations. In this way, samples can influence predictions depend- 
ing on the observed spatial autocorrelation and distribution of samples. 
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12.18: Kernel mapping assigns a density surface around observation points. The den- 
sity surface represents the likelihood or probability that the organism or popula- 
tion occupied the area. “Stacking.” or adding together the density surfaces for a 
set of observations creates a combined probability map, giving an estimate of the 
occupancy across combined observations or samples. The density surface is typ- 
ically described mathematically by an equation, with the shape controlled by 
parameters. There is often a “bandwidth” parameter that affects the “peaked- 
ness” of each individual density surface. 


12.20: B shows the wider bandwidth, because the distributions are broader around 
each point, and the maximum values are lower, indicated by less saturated shad- 
ing near the point locations. 
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132: Criteria often must be refined because they are expressed in a way that may 
not be directly applied. Distances or groupings may be ambiguously 
described, e.g., near, far, large, or small, and often these must be quantified 
before entry into a cartographic model. 


134: A discrete weighting has specific, distinct categories. Roads may be pass- 
able or impassable for large vehicles, rivers deep or shallow, or forests ever- 
green or conifer dominated. Weightings may be defined which are specific to 
these categories, e.g, passable roads receive a weight of 0.5, while impass- 
able roads a weight of zero, In contrast, weightings may also be continuous, 
in that each road may have a measured width, e.g., 12.4 meters, and the 
‘weight may be some continuous function of this width, e.g., width *132.7. 


13.6: A - original 
B - reclassed by T, with low and high input values reclassed to high output 
values, and intermediate input values reclassed to low output values: 
C- reclassed by W, with low input values set to zero when below a threshold, 
and then a linear increase in output with input above the threshold: 
D - Reclassed by R. High values above a threshold are set to zero, and Low 
‘values below the threshold are set to 1 


Flowchart D is the most plausible 

B - Clip of high density by park buffer incorrectly provides the complement 
of the distance from current park criteria, selecting the wrong areas; Unions 
school buffer and HDNP buffer, would also need a subsequent selection to 
extract appropriate areas: 

C- Unions high density and park buffer. Would need a select and delete, or a 
clip or some other operation to correctly apply the distance from part criteria, 


13.10:Comect answer is С 
A- applies area test too early. later intersections may split candidate poly- 
gons, rendering smaller area polygons than the size limit specified for the 
analysis: 
B - commits same error, later in the process: 
D - omits raster to vector conversion on elevation, and allows areas in same 
parcel that are satisfactory and adjacent, but split by other criteria to be in dif- 
ferent polygons, so may reject acceptable areas. 
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14.2: Data transfer is perhaps the process most commonly aided by spatial data stan- 
dards. Among other things, standards require that spatial data sources, methods, 
and characteristics be described in а consistent manner. Spatial data standards 
provide a predictable way of organizing data so that it may be transferred among 
“organizations. Standards allow data to be transferred without loss of information, 


144; The mean reports the central error, the average error one would expect when. 
sampling from a population. A frequency threshold describes the percentage of 
errors above or below a value. This gives some general notion of the likelihood 
of large or small errors. 


146: Positional accuracy reports how close the represented locations of objects are 
near the true locations of the objects. Attribute accuracy reflects how often the 
value of a categorical attribute is correct (discrete) or bow close and interval/ 
ratio attribute is to the true value (continuous). Logical consistency does not 
imply either spatial or attribute accuracy, but just that multiple themes or types of 
are consistent, e.g., there are no roads in a lake, fires on зай flats, or oil deposits 
in granitic rocks. 


148: The steps in applying the NSSDA are 1) identify test points, 2) identify source. 
for "true" points, and extract truth corresponding to test points, 3) Calculate the 
positional error for each true tes pair, 4) record the error data in а standardized 
table, which includes calculation of error statistics, 5) create the documentation’ 
metadata describing the accuracy assessment. 


14.10: Good candidate points are any features that are well-defined and may be visi- 
ble on both the data set to be tested and in the source used for truth. This often 
‘means constructed features, for example road intersections, curbs, manhole cov- 
ers, geodetic markers, utility poles, ог бге hydrants or other relatively immobile. 
features. 


14.12: Metadata are the “data about data.” They are important because the describe 
the characteristics about any data we might wish to use. The allow us to evaluate 
the data suitability for intended uses, maintain the investment over multiple 
organizations or changes in personnel, and help us in explaining or describing. 
ош data to others. 
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